This is the start of the stable review cycle for the 5.4.16 release. There are 104 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 30 Jan 2020 13:57:09 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.4.16-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.4.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 5.4.16-rc1
Martin Schiller ms@dev.tdt.de net/x25: fix nonblocking connect
Pablo Neira Ayuso pablo@netfilter.org netfilter: nf_tables: autoload modules from the abort path
Pablo Neira Ayuso pablo@netfilter.org netfilter: nf_tables: add __nft_chain_type_get()
Kadlecsik József kadlec@blackhole.kfki.hu netfilter: ipset: use bitmap infrastructure completely
Hans Verkuil hverkuil-cisco@xs4all.nl media: v4l2-ioctl.c: zero reserved fields for S/TRY_FMT
Wen Huang huangwenabc@gmail.com libertas: Fix two buffer overflows at parsing bss descriptor
Finn Thain fthain@telegraphics.com.au net/sonic: Prevent tx watchdog timeout
Finn Thain fthain@telegraphics.com.au net/sonic: Fix CAM initialization
Finn Thain fthain@telegraphics.com.au net/sonic: Fix command register usage
Finn Thain fthain@telegraphics.com.au net/sonic: Quiesce SONIC before re-initializing descriptor memory
Finn Thain fthain@telegraphics.com.au net/sonic: Fix receive buffer replenishment
Finn Thain fthain@telegraphics.com.au net/sonic: Improve receive descriptor status flag check
Finn Thain fthain@telegraphics.com.au net/sonic: Avoid needless receive descriptor EOL flag updates
Finn Thain fthain@telegraphics.com.au net/sonic: Fix receive buffer handling
Finn Thain fthain@telegraphics.com.au net/sonic: Fix interface error stats collection
Finn Thain fthain@telegraphics.com.au net/sonic: Use MMIO accessors
Finn Thain fthain@telegraphics.com.au net/sonic: Clear interrupt flags immediately
Finn Thain fthain@telegraphics.com.au net/sonic: Add mutual exclusion for accessing shared state
Linus Torvalds torvalds@linux-foundation.org readdir: be more conservative with directory entry names
Al Viro viro@zeniv.linux.org.uk do_last(): fetch directory ->i_mode and ->i_uid before it's too late
Jakub Sitnicki jakub@cloudflare.com net, sk_msg: Don't check if sock is locked when tearing down psock
Ulrich Weber ulrich.weber@gmail.com xfrm: support output_mark for offload ESP packets
Matthew Auld matthew.auld@intel.com drm/i915/userptr: fix size calculation
Johannes Berg johannes.berg@intel.com iwlwifi: mvm: fix potential SKB leak on TXQ TX
Johannes Berg johannes.berg@intel.com iwlwifi: mvm: fix SKB leak on invalid queue
Changbin Du changbin.du@intel.com tracing: xen: Ordered comparison of function pointers
Bart Van Assche bvanassche@acm.org scsi: RDMA/isert: Fix a recently introduced regression related to logout
Gilles Buloz gilles.buloz@kontron.com hwmon: (nct7802) Fix non-working alarm on voltages
Gilles Buloz gilles.buloz@kontron.com hwmon: (nct7802) Fix voltage limits to wrong registers
xiaofeng.yan yanxiaofeng7@jd.com hsr: Fix a compilation error
Jacek Anaszewski jacek.anaszewski@gmail.com leds: gpio: Fix uninitialized gpio label for fwnode based probe
Linus Torvalds torvalds@linux-foundation.org readdir: make user_access_begin() use the real access range
Shuah Khan skhan@linuxfoundation.org iommu/amd: Fix IOMMU perf counter clobbering during init
Christophe Leroy christophe.leroy@c-s.fr lib: Reduce user_access_begin() boundaries in strncpy_from_user() and strnlen_user()
Florian Westphal fw@strlen.de netfilter: nft_osf: add missing check for DREG attribute
Chuhong Yuan hslester96@gmail.com Input: sun4i-ts - add a check for devm_thermal_zone_of_sensor_register
Johan Hovold johan@kernel.org Input: pegasus_notetaker - fix endpoint sanity check
Johan Hovold johan@kernel.org Input: aiptek - fix endpoint sanity check
Johan Hovold johan@kernel.org Input: gtco - fix endpoint sanity check
Johan Hovold johan@kernel.org Input: sur40 - fix interface sanity checks
Stephan Gerhold stephan@gerhold.net Input: pm8xxx-vib - fix handling of separate enable register
Jakub Kicinski jakub.kicinski@netronome.com net/tls: fix async operation
Ido Schimmel idosch@mellanox.com mlxsw: switchx2: Do not modify cloned SKBs during xmit
Faiz Abbas faiz_abbas@ti.com mmc: sdhci_am654: Reset Command and Data line after tuning
Faiz Abbas faiz_abbas@ti.com mmc: sdhci_am654: Remove Inverted Write Protect flag
Michał Mirosław mirq-linux@rere.qmqm.pl mmc: sdhci: fix minimum clock rate for v3 controller
Michał Mirosław mirq-linux@rere.qmqm.pl mmc: tegra: fix SDR50 tuning override
Alex Sverdlin alexander.sverdlin@nokia.com ARM: 8950/1: ftrace/recordmcount: filter relocation types
Hans Verkuil hverkuil-cisco@xs4all.nl Revert "Input: synaptics-rmi4 - don't increment rmiaddr for SMBus transfers"
Johan Hovold johan@kernel.org Input: keyspan-remote - fix control-message timeouts
Jerry Snitselaar jsnitsel@redhat.com iommu/vt-d: Call __dmar_remove_one_dev_info with valid pointer
Boyan Ding boyan.j.ding@gmail.com pinctrl: sunrisepoint: Add missing Interrupt Status register offset
Matthew Wilcox (Oracle) willy@infradead.org XArray: Fix xas_find returning too many entries
Matthew Wilcox (Oracle) willy@infradead.org XArray: Fix xa_find_after with multi-index entries
Matthew Wilcox (Oracle) willy@infradead.org XArray: Fix infinite loop with entry at ULONG_MAX
Emmanuel Grumbach emmanuel.grumbach@intel.com iwlwifi: mvm: don't send the IWL_MVM_RXQ_NSSN_SYNC notif to Rx queues
Mehmet Akif Tasova makiftasova@gmail.com Revert "iwlwifi: mvm: fix scan config command size"
Frederic Barrat fbarrat@linux.ibm.com powerpc/xive: Discard ESB load value when interrupt is invalid
Aneesh Kumar K.V aneesh.kumar@linux.ibm.com powerpc/mm/hash: Fix sharing context ids between kernel & userspace
Steven Rostedt (VMware) rostedt@goodmis.org tracing: Fix histogram code when expression has same var as value
Masami Ichikawa masami256@gmail.com tracing: Do not set trace clock if tracefs lockdown is in effect
Masami Hiramatsu mhiramat@kernel.org tracing/uprobe: Fix double perf_event linking on multiprobe uprobe
Masami Hiramatsu mhiramat@kernel.org tracing: trigger: Replace unneeded RCU-list traversals
Alexander Potapenko glider@google.com PM: hibernate: fix crashes with init_on_free=1
Tvrtko Ursulin tvrtko.ursulin@intel.com drm/i915: Align engine->uabi_class/instance with i915_drm.h
Boris Brezillon boris.brezillon@collabora.com drm/panfrost: Add the panfrost_gem_mapping concept
Alex Deucher alexander.deucher@amd.com PCI: Mark AMD Navi14 GPU rev 0xc5 ATS as broken
Jeff Layton jlayton@kernel.org ceph: hold extra reference to r_parent over life of request
Guenter Roeck linux@roeck-us.net hwmon: (core) Do not use device managed functions for memory allocations
Luuk Paulussen luuk.paulussen@alliedtelesis.co.nz hwmon: (adt7475) Make volt2reg return same reg as reg2volt input
David Howells dhowells@redhat.com afs: Fix characters allowed into cell names
Jens Axboe axboe@kernel.dk Revert "io_uring: only allow submit from owning task"
David Ahern dsahern@gmail.com ipv4: Detect rollover in specific fib table dump
Tariq Toukan tariqt@mellanox.com net/mlx5e: kTLS, Do not send decrypted-marked SKBs via non-accel path
Tariq Toukan tariqt@mellanox.com net/mlx5e: kTLS, Remove redundant posts in TX resync flow
Tariq Toukan tariqt@mellanox.com net/mlx5e: kTLS, Fix corner-case checks in TX resync flow
Erez Shitrit erezsh@mellanox.com net/mlx5: DR, use non preemptible call to get the current cpu number
Eli Cohen eli@mellanox.com net/mlx5: E-Switch, Prevent ingress rate configuration of uplink rep
Erez Shitrit erezsh@mellanox.com net/mlx5: DR, Enable counter on non-fwd-dest objects
Meir Lichtinger meirl@mellanox.com net/mlx5: Update the list of the PCI supported devices
Paul Blakey paulb@mellanox.com net/mlx5: Fix lowest FDB pool size
Maxim Mikityanskiy maximmi@mellanox.com net: Fix packet reordering caused by GRO and listified RX cooperation
Kristian Evensen kristian.evensen@gmail.com fou: Fix IPv6 netlink policy
Ido Schimmel idosch@mellanox.com mlxsw: spectrum_acl: Fix use-after-free during reload
Michael Ellerman mpe@ellerman.id.au airo: Add missing CAP_NET_ADMIN check in AIROOLDIOCTL/SIOCDEVPRIVATE
Michael Ellerman mpe@ellerman.id.au airo: Fix possible info leak in AIROOLDIOCTL/SIOCDEVPRIVATE
Eric Dumazet edumazet@google.com tun: add mutex_unlock() call and napi.skb clearing in tun_get_user()
Eric Dumazet edumazet@google.com tcp: do not leave dangling pointers in tp->highest_sack
Wen Yang wenyang@linux.alibaba.com tcp_bbr: improve arithmetic division in bbr_update_bw()
Paolo Abeni pabeni@redhat.com Revert "udp: do rmem bulk free even if the rx sk queue is empty"
James Hughes james.hughes@raspberrypi.org net: usb: lan78xx: Add .ndo_features_check
Jouni Hogander jouni.hogander@unikie.com net-sysfs: Fix reference count leak
Eric Dumazet edumazet@google.com net_sched: use validated TCA_KIND attribute in tc_new_tfilter()
Cong Wang xiyou.wangcong@gmail.com net_sched: fix datalen for ematch
Eric Dumazet edumazet@google.com net: rtnetlink: validate IFLA_MTU attribute in rtnl_create_link()
William Dauchy w.dauchy@criteo.com net, ip_tunnel: fix namespaces move
William Dauchy w.dauchy@criteo.com net, ip6_tunnel: fix namespaces move
Niko Kortstrom niko.kortstrom@nokia.com net: ip6_gre: fix moving ip6gre between namespaces
Michael Ellerman mpe@ellerman.id.au net: cxgb3_main: Add CAP_NET_ADMIN check to CHELSIO_GET_MEM
Florian Fainelli f.fainelli@gmail.com net: bcmgenet: Use netif_tx_napi_add() for TX NAPI
Yuki Taguchi tagyounit@gmail.com ipv6: sr: remove SKB_GSO_IPXIP6 on End.D* actions
Eric Dumazet edumazet@google.com gtp: make sure only SOCK_DGRAM UDP sockets are accepted
Wenwen Wang wenwen@cs.uga.edu firestream: fix memory leaks
Richard Palethorpe rpalethorpe@suse.com can, slip: Protect tty->disc_data in write_wakeup and close with RCU
-------------
Diffstat:
Makefile | 4 +- arch/powerpc/include/asm/book3s/64/mmu-hash.h | 5 +- arch/powerpc/include/asm/xive-regs.h | 1 + arch/powerpc/sysdev/xive/common.c | 15 +- drivers/atm/firestream.c | 3 + drivers/gpu/drm/i915/gem/i915_gem_busy.c | 12 +- drivers/gpu/drm/i915/gem/i915_gem_userptr.c | 9 +- drivers/gpu/drm/i915/gt/intel_engine_types.h | 4 +- drivers/gpu/drm/i915/i915_gem_gtt.c | 2 + drivers/gpu/drm/panfrost/panfrost_drv.c | 91 ++++- drivers/gpu/drm/panfrost/panfrost_gem.c | 124 ++++++- drivers/gpu/drm/panfrost/panfrost_gem.h | 41 ++- drivers/gpu/drm/panfrost/panfrost_gem_shrinker.c | 3 +- drivers/gpu/drm/panfrost/panfrost_job.c | 13 +- drivers/gpu/drm/panfrost/panfrost_job.h | 1 + drivers/gpu/drm/panfrost/panfrost_mmu.c | 61 ++-- drivers/gpu/drm/panfrost/panfrost_mmu.h | 6 +- drivers/gpu/drm/panfrost/panfrost_perfcnt.c | 34 +- drivers/hwmon/adt7475.c | 5 +- drivers/hwmon/hwmon.c | 68 ++-- drivers/hwmon/nct7802.c | 75 +++- drivers/infiniband/ulp/isert/ib_isert.c | 12 - drivers/input/misc/keyspan_remote.c | 9 +- drivers/input/misc/pm8xxx-vibrator.c | 2 +- drivers/input/rmi4/rmi_smbus.c | 2 + drivers/input/tablet/aiptek.c | 6 +- drivers/input/tablet/gtco.c | 10 +- drivers/input/tablet/pegasus_notetaker.c | 2 +- drivers/input/touchscreen/sun4i-ts.c | 6 +- drivers/input/touchscreen/sur40.c | 2 +- drivers/iommu/amd_iommu_init.c | 24 +- drivers/iommu/intel-iommu.c | 3 +- drivers/leds/leds-gpio.c | 10 +- drivers/media/v4l2-core/v4l2-ioctl.c | 24 +- drivers/mmc/host/sdhci-tegra.c | 2 +- drivers/mmc/host/sdhci.c | 10 +- drivers/mmc/host/sdhci_am654.c | 27 +- drivers/net/can/slcan.c | 12 +- drivers/net/ethernet/broadcom/genet/bcmgenet.c | 4 +- drivers/net/ethernet/chelsio/cxgb3/cxgb3_main.c | 2 + .../ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c | 49 +-- drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 9 +- .../ethernet/mellanox/mlx5/core/eswitch_offloads.c | 2 +- drivers/net/ethernet/mellanox/mlx5/core/main.c | 1 + .../ethernet/mellanox/mlx5/core/steering/dr_send.c | 3 +- .../ethernet/mellanox/mlx5/core/steering/fs_dr.c | 42 ++- drivers/net/ethernet/mellanox/mlxsw/spectrum_acl.c | 16 +- drivers/net/ethernet/mellanox/mlxsw/switchx2.c | 17 +- drivers/net/ethernet/natsemi/sonic.c | 380 +++++++++++++-------- drivers/net/ethernet/natsemi/sonic.h | 44 ++- drivers/net/gtp.c | 10 +- drivers/net/slip/slip.c | 12 +- drivers/net/tun.c | 4 + drivers/net/usb/lan78xx.c | 15 + drivers/net/wireless/cisco/airo.c | 20 +- drivers/net/wireless/intel/iwlwifi/mvm/constants.h | 1 + drivers/net/wireless/intel/iwlwifi/mvm/mac80211.c | 28 +- drivers/net/wireless/intel/iwlwifi/mvm/mvm.h | 4 +- drivers/net/wireless/intel/iwlwifi/mvm/rxmq.c | 19 +- drivers/net/wireless/intel/iwlwifi/mvm/scan.c | 2 +- drivers/net/wireless/intel/iwlwifi/mvm/tx.c | 6 +- drivers/net/wireless/intel/iwlwifi/pcie/rx.c | 4 +- drivers/net/wireless/marvell/libertas/cfg.c | 16 +- drivers/pci/quirks.c | 19 +- drivers/pinctrl/intel/pinctrl-sunrisepoint.c | 1 + drivers/target/iscsi/iscsi_target.c | 6 +- fs/afs/cell.c | 11 +- fs/ceph/mds_client.c | 8 +- fs/io_uring.c | 6 - fs/namei.c | 17 +- fs/readdir.c | 79 ++--- include/linux/netdevice.h | 2 + include/linux/netfilter/ipset/ip_set.h | 7 - include/linux/netfilter/nfnetlink.h | 2 +- include/net/netns/nftables.h | 1 + include/trace/events/xen.h | 6 +- kernel/power/snapshot.c | 20 +- kernel/trace/trace.c | 5 + kernel/trace/trace_events_hist.c | 63 +++- kernel/trace/trace_events_trigger.c | 20 +- kernel/trace/trace_kprobe.c | 2 +- kernel/trace/trace_probe.c | 5 +- kernel/trace/trace_probe.h | 3 +- kernel/trace/trace_uprobe.c | 124 ++++--- lib/strncpy_from_user.c | 14 +- lib/strnlen_user.c | 14 +- lib/test_xarray.c | 56 ++- lib/xarray.c | 33 +- net/core/dev.c | 97 +++--- net/core/rtnetlink.c | 13 +- net/core/skmsg.c | 2 - net/hsr/hsr_main.h | 2 +- net/ipv4/esp4_offload.c | 2 + net/ipv4/fib_trie.c | 6 + net/ipv4/fou.c | 4 +- net/ipv4/ip_tunnel.c | 4 +- net/ipv4/tcp.c | 1 + net/ipv4/tcp_bbr.c | 3 +- net/ipv4/tcp_input.c | 1 + net/ipv4/tcp_output.c | 1 + net/ipv4/udp.c | 3 +- net/ipv6/esp6_offload.c | 2 + net/ipv6/ip6_gre.c | 3 - net/ipv6/ip6_tunnel.c | 4 +- net/ipv6/seg6_local.c | 4 +- net/netfilter/ipset/ip_set_bitmap_gen.h | 2 +- net/netfilter/ipset/ip_set_bitmap_ip.c | 6 +- net/netfilter/ipset/ip_set_bitmap_ipmac.c | 6 +- net/netfilter/ipset/ip_set_bitmap_port.c | 6 +- net/netfilter/nf_tables_api.c | 155 ++++++--- net/netfilter/nfnetlink.c | 6 +- net/netfilter/nft_osf.c | 3 + net/sched/cls_api.c | 5 +- net/sched/ematch.c | 2 +- net/tls/tls_sw.c | 4 +- net/x25/af_x25.c | 6 +- scripts/recordmcount.c | 17 + 117 files changed, 1577 insertions(+), 767 deletions(-)
From: Richard Palethorpe rpalethorpe@suse.com
[ Upstream commit 0ace17d56824165c7f4c68785d6b58971db954dd ]
write_wakeup can happen in parallel with close/hangup where tty->disc_data is set to NULL and the netdevice is freed thus also freeing disc_data. write_wakeup accesses disc_data so we must prevent close from freeing the netdev while write_wakeup has a non-NULL view of tty->disc_data.
We also need to make sure that accesses to disc_data are atomic. Which can all be done with RCU.
This problem was found by Syzkaller on SLCAN, but the same issue is reproducible with the SLIP line discipline using an LTP test based on the Syzkaller reproducer.
A fix which didn't use RCU was posted by Hillf Danton.
Fixes: 661f7fda21b1 ("slip: Fix deadlock in write_wakeup") Fixes: a8e83b17536a ("slcan: Port write_wakeup deadlock fix from slip") Reported-by: syzbot+017e491ae13c0068598a@syzkaller.appspotmail.com Signed-off-by: Richard Palethorpe rpalethorpe@suse.com Cc: Wolfgang Grandegger wg@grandegger.com Cc: Marc Kleine-Budde mkl@pengutronix.de Cc: "David S. Miller" davem@davemloft.net Cc: Tyler Hall tylerwhall@gmail.com Cc: linux-can@vger.kernel.org Cc: netdev@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: syzkaller@googlegroups.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/can/slcan.c | 12 ++++++++++-- drivers/net/slip/slip.c | 12 ++++++++++-- 2 files changed, 20 insertions(+), 4 deletions(-)
--- a/drivers/net/can/slcan.c +++ b/drivers/net/can/slcan.c @@ -344,9 +344,16 @@ static void slcan_transmit(struct work_s */ static void slcan_write_wakeup(struct tty_struct *tty) { - struct slcan *sl = tty->disc_data; + struct slcan *sl; + + rcu_read_lock(); + sl = rcu_dereference(tty->disc_data); + if (!sl) + goto out;
schedule_work(&sl->tx_work); +out: + rcu_read_unlock(); }
/* Send a can_frame to a TTY queue. */ @@ -644,10 +651,11 @@ static void slcan_close(struct tty_struc return;
spin_lock_bh(&sl->lock); - tty->disc_data = NULL; + rcu_assign_pointer(tty->disc_data, NULL); sl->tty = NULL; spin_unlock_bh(&sl->lock);
+ synchronize_rcu(); flush_work(&sl->tx_work);
/* Flush network side */ --- a/drivers/net/slip/slip.c +++ b/drivers/net/slip/slip.c @@ -452,9 +452,16 @@ static void slip_transmit(struct work_st */ static void slip_write_wakeup(struct tty_struct *tty) { - struct slip *sl = tty->disc_data; + struct slip *sl; + + rcu_read_lock(); + sl = rcu_dereference(tty->disc_data); + if (!sl) + goto out;
schedule_work(&sl->tx_work); +out: + rcu_read_unlock(); }
static void sl_tx_timeout(struct net_device *dev) @@ -882,10 +889,11 @@ static void slip_close(struct tty_struct return;
spin_lock_bh(&sl->lock); - tty->disc_data = NULL; + rcu_assign_pointer(tty->disc_data, NULL); sl->tty = NULL; spin_unlock_bh(&sl->lock);
+ synchronize_rcu(); flush_work(&sl->tx_work);
/* VSV = very important to remove timers */
From: Wenwen Wang wenwen@cs.uga.edu
[ Upstream commit fa865ba183d61c1ec8cbcab8573159c3b72b89a4 ]
In fs_open(), 'vcc' is allocated through kmalloc() and assigned to 'atm_vcc->dev_data.' In the following execution, if an error occurs, e.g., there is no more free channel, an error code EBUSY or ENOMEM will be returned. However, 'vcc' is not deallocated, leading to memory leaks. Note that, in normal cases where fs_open() returns 0, 'vcc' will be deallocated in fs_close(). But, if fs_open() fails, there is no guarantee that fs_close() will be invoked.
To fix this issue, deallocate 'vcc' before the error code is returned.
Signed-off-by: Wenwen Wang wenwen@cs.uga.edu Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/atm/firestream.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/drivers/atm/firestream.c +++ b/drivers/atm/firestream.c @@ -912,6 +912,7 @@ static int fs_open(struct atm_vcc *atm_v } if (!to) { printk ("No more free channels for FS50..\n"); + kfree(vcc); return -EBUSY; } vcc->channo = dev->channo; @@ -922,6 +923,7 @@ static int fs_open(struct atm_vcc *atm_v if (((DO_DIRECTION(rxtp) && dev->atm_vccs[vcc->channo])) || ( DO_DIRECTION(txtp) && test_bit (vcc->channo, dev->tx_inuse))) { printk ("Channel is in use for FS155.\n"); + kfree(vcc); return -EBUSY; } } @@ -935,6 +937,7 @@ static int fs_open(struct atm_vcc *atm_v tc, sizeof (struct fs_transmit_config)); if (!tc) { fs_dprintk (FS_DEBUG_OPEN, "fs: can't alloc transmit_config.\n"); + kfree(vcc); return -ENOMEM; }
From: Eric Dumazet edumazet@google.com
[ Upstream commit 940ba14986657a50c15f694efca1beba31fa568f ]
A malicious user could use RAW sockets and fool GTP using them as standard SOCK_DGRAM UDP sockets.
BUG: KMSAN: uninit-value in udp_tunnel_encap_enable include/net/udp_tunnel.h:174 [inline] BUG: KMSAN: uninit-value in setup_udp_tunnel_sock+0x45e/0x6f0 net/ipv4/udp_tunnel.c:85 CPU: 0 PID: 11262 Comm: syz-executor613 Not tainted 5.5.0-rc5-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1c9/0x220 lib/dump_stack.c:118 kmsan_report+0xf7/0x1e0 mm/kmsan/kmsan_report.c:118 __msan_warning+0x58/0xa0 mm/kmsan/kmsan_instr.c:215 udp_tunnel_encap_enable include/net/udp_tunnel.h:174 [inline] setup_udp_tunnel_sock+0x45e/0x6f0 net/ipv4/udp_tunnel.c:85 gtp_encap_enable_socket+0x37f/0x5a0 drivers/net/gtp.c:827 gtp_encap_enable drivers/net/gtp.c:844 [inline] gtp_newlink+0xfb/0x1e50 drivers/net/gtp.c:666 __rtnl_newlink net/core/rtnetlink.c:3305 [inline] rtnl_newlink+0x2973/0x3920 net/core/rtnetlink.c:3363 rtnetlink_rcv_msg+0x1153/0x1570 net/core/rtnetlink.c:5424 netlink_rcv_skb+0x451/0x650 net/netlink/af_netlink.c:2477 rtnetlink_rcv+0x50/0x60 net/core/rtnetlink.c:5442 netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline] netlink_unicast+0xf9e/0x1100 net/netlink/af_netlink.c:1328 netlink_sendmsg+0x1248/0x14d0 net/netlink/af_netlink.c:1917 sock_sendmsg_nosec net/socket.c:639 [inline] sock_sendmsg net/socket.c:659 [inline] ____sys_sendmsg+0x12b6/0x1350 net/socket.c:2330 ___sys_sendmsg net/socket.c:2384 [inline] __sys_sendmsg+0x451/0x5f0 net/socket.c:2417 __do_sys_sendmsg net/socket.c:2426 [inline] __se_sys_sendmsg+0x97/0xb0 net/socket.c:2424 __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2424 do_syscall_64+0xb8/0x160 arch/x86/entry/common.c:296 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x441359 Code: e8 ac e8 ff ff 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 eb 08 fc ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007fff1cd0ac28 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000441359 RDX: 0000000000000000 RSI: 0000000020000100 RDI: 0000000000000003 RBP: 00000000006cb018 R08: 00000000004002c8 R09: 00000000004002c8 R10: 00000000004002c8 R11: 0000000000000246 R12: 00000000004020d0 R13: 0000000000402160 R14: 0000000000000000 R15: 0000000000000000
Uninit was created at: kmsan_save_stack_with_flags+0x3c/0x90 mm/kmsan/kmsan.c:144 kmsan_internal_alloc_meta_for_pages mm/kmsan/kmsan_shadow.c:307 [inline] kmsan_alloc_page+0x12a/0x310 mm/kmsan/kmsan_shadow.c:336 __alloc_pages_nodemask+0x57f2/0x5f60 mm/page_alloc.c:4800 alloc_pages_current+0x67d/0x990 mm/mempolicy.c:2207 alloc_pages include/linux/gfp.h:534 [inline] alloc_slab_page+0x111/0x12f0 mm/slub.c:1511 allocate_slab mm/slub.c:1656 [inline] new_slab+0x2bc/0x1130 mm/slub.c:1722 new_slab_objects mm/slub.c:2473 [inline] ___slab_alloc+0x1533/0x1f30 mm/slub.c:2624 __slab_alloc mm/slub.c:2664 [inline] slab_alloc_node mm/slub.c:2738 [inline] slab_alloc mm/slub.c:2783 [inline] kmem_cache_alloc+0xb23/0xd70 mm/slub.c:2788 sk_prot_alloc+0xf2/0x620 net/core/sock.c:1597 sk_alloc+0xf0/0xbe0 net/core/sock.c:1657 inet_create+0x7c7/0x1370 net/ipv4/af_inet.c:321 __sock_create+0x8eb/0xf00 net/socket.c:1420 sock_create net/socket.c:1471 [inline] __sys_socket+0x1a1/0x600 net/socket.c:1513 __do_sys_socket net/socket.c:1522 [inline] __se_sys_socket+0x8d/0xb0 net/socket.c:1520 __x64_sys_socket+0x4a/0x70 net/socket.c:1520 do_syscall_64+0xb8/0x160 arch/x86/entry/common.c:296 entry_SYSCALL_64_after_hwframe+0x44/0xa9
Fixes: 459aa660eb1d ("gtp: add initial driver for datapath of GPRS Tunneling Protocol (GTP-U)") Signed-off-by: Eric Dumazet edumazet@google.com Cc: Pablo Neira pablo@netfilter.org Reported-by: syzbot syzkaller@googlegroups.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/gtp.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-)
--- a/drivers/net/gtp.c +++ b/drivers/net/gtp.c @@ -804,19 +804,21 @@ static struct sock *gtp_encap_enable_soc return NULL; }
- if (sock->sk->sk_protocol != IPPROTO_UDP) { + sk = sock->sk; + if (sk->sk_protocol != IPPROTO_UDP || + sk->sk_type != SOCK_DGRAM || + (sk->sk_family != AF_INET && sk->sk_family != AF_INET6)) { pr_debug("socket fd=%d not UDP\n", fd); sk = ERR_PTR(-EINVAL); goto out_sock; }
- lock_sock(sock->sk); - if (sock->sk->sk_user_data) { + lock_sock(sk); + if (sk->sk_user_data) { sk = ERR_PTR(-EBUSY); goto out_rel_sock; }
- sk = sock->sk; sock_hold(sk);
tuncfg.sk_user_data = gtp;
From: Yuki Taguchi tagyounit@gmail.com
[ Upstream commit 62ebaeaedee7591c257543d040677a60e35c7aec ]
After LRO/GRO is applied, SRv6 encapsulated packets have SKB_GSO_IPXIP6 feature flag, and this flag must be removed right after decapulation procedure.
Currently, SKB_GSO_IPXIP6 flag is not removed on End.D* actions, which creates inconsistent packet state, that is, a normal TCP/IP packets have the SKB_GSO_IPXIP6 flag. This behavior can cause unexpected fallback to GSO on routing to netdevices that do not support SKB_GSO_IPXIP6. For example, on inter-VRF forwarding, decapsulated packets separated into small packets by GSO because VRF devices do not support TSO for packets with SKB_GSO_IPXIP6 flag, and this degrades forwarding performance.
This patch removes encapsulation related GSO flags from the skb right after the End.D* action is applied.
Fixes: d7a669dd2f8b ("ipv6: sr: add helper functions for seg6local") Signed-off-by: Yuki Taguchi tagyounit@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv6/seg6_local.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/net/ipv6/seg6_local.c +++ b/net/ipv6/seg6_local.c @@ -23,6 +23,7 @@ #include <net/addrconf.h> #include <net/ip6_route.h> #include <net/dst_cache.h> +#include <net/ip_tunnels.h> #ifdef CONFIG_IPV6_SEG6_HMAC #include <net/seg6_hmac.h> #endif @@ -135,7 +136,8 @@ static bool decap_and_validate(struct sk
skb_reset_network_header(skb); skb_reset_transport_header(skb); - skb->encapsulation = 0; + if (iptunnel_pull_offloads(skb)) + return false;
return true; }
From: Florian Fainelli f.fainelli@gmail.com
[ Upstream commit 148965df1a990af98b2c84092c2a2274c7489284 ]
Before commit 7587935cfa11 ("net: bcmgenet: move NAPI initialization to ring initialization") moved the code, this used to be netif_tx_napi_add(), but we lost that small semantic change in the process, restore that.
Fixes: 7587935cfa11 ("net: bcmgenet: move NAPI initialization to ring initialization") Signed-off-by: Florian Fainelli f.fainelli@gmail.com Acked-by: Doug Berger opendmb@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/broadcom/genet/bcmgenet.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/net/ethernet/broadcom/genet/bcmgenet.c +++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c @@ -2164,8 +2164,8 @@ static void bcmgenet_init_tx_ring(struct DMA_END_ADDR);
/* Initialize Tx NAPI */ - netif_napi_add(priv->dev, &ring->napi, bcmgenet_tx_poll, - NAPI_POLL_WEIGHT); + netif_tx_napi_add(priv->dev, &ring->napi, bcmgenet_tx_poll, + NAPI_POLL_WEIGHT); }
/* Initialize a RDMA ring */
From: Michael Ellerman mpe@ellerman.id.au
[ Upstream commit 3546d8f1bbe992488ed91592cf6bf76e7114791a =
The cxgb3 driver for "Chelsio T3-based gigabit and 10Gb Ethernet adapters" implements a custom ioctl as SIOCCHIOCTL/SIOCDEVPRIVATE in cxgb_extension_ioctl().
One of the subcommands of the ioctl is CHELSIO_GET_MEM, which appears to read memory directly out of the adapter and return it to userspace. It's not entirely clear what the contents of the adapter memory contains, but the assumption is that it shouldn't be accessible to all users.
So add a CAP_NET_ADMIN check to the CHELSIO_GET_MEM case. Put it after the is_offload() check, which matches two of the other subcommands in the same function which also check for is_offload() and CAP_NET_ADMIN.
Found by Ilja by code inspection, not tested as I don't have the required hardware.
Reported-by: Ilja Van Sprundel ivansprundel@ioactive.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/chelsio/cxgb3/cxgb3_main.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/drivers/net/ethernet/chelsio/cxgb3/cxgb3_main.c +++ b/drivers/net/ethernet/chelsio/cxgb3/cxgb3_main.c @@ -2448,6 +2448,8 @@ static int cxgb_extension_ioctl(struct n
if (!is_offload(adapter)) return -EOPNOTSUPP; + if (!capable(CAP_NET_ADMIN)) + return -EPERM; if (!(adapter->flags & FULL_INIT_DONE)) return -EIO; /* need the memory controllers */ if (copy_from_user(&t, useraddr, sizeof(t)))
From: Niko Kortstrom niko.kortstrom@nokia.com
[ Upstream commit 690afc165bb314354667f67157c1a1aea7dc797a ]
Support for moving IPv4 GRE tunnels between namespaces was added in commit b57708add314 ("gre: add x-netns support"). The respective change for IPv6 tunnels, commit 22f08069e8b4 ("ip6gre: add x-netns support") did not drop NETIF_F_NETNS_LOCAL flag so moving them from one netns to another is still denied in IPv6 case. Drop NETIF_F_NETNS_LOCAL flag from ip6gre tunnels to allow moving ip6gre tunnel endpoints between network namespaces.
Signed-off-by: Niko Kortstrom niko.kortstrom@nokia.com Acked-by: Nicolas Dichtel nicolas.dichtel@6wind.com Acked-by: William Tu u9012063@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv6/ip6_gre.c | 3 --- 1 file changed, 3 deletions(-)
--- a/net/ipv6/ip6_gre.c +++ b/net/ipv6/ip6_gre.c @@ -1466,7 +1466,6 @@ static int ip6gre_tunnel_init_common(str dev->mtu -= 8;
if (tunnel->parms.collect_md) { - dev->features |= NETIF_F_NETNS_LOCAL; netif_keep_dst(dev); } ip6gre_tnl_init_features(dev); @@ -1894,7 +1893,6 @@ static void ip6gre_tap_setup(struct net_ dev->needs_free_netdev = true; dev->priv_destructor = ip6gre_dev_free;
- dev->features |= NETIF_F_NETNS_LOCAL; dev->priv_flags &= ~IFF_TX_SKB_SHARING; dev->priv_flags |= IFF_LIVE_ADDR_CHANGE; netif_keep_dst(dev); @@ -2197,7 +2195,6 @@ static void ip6erspan_tap_setup(struct n dev->needs_free_netdev = true; dev->priv_destructor = ip6gre_dev_free;
- dev->features |= NETIF_F_NETNS_LOCAL; dev->priv_flags &= ~IFF_TX_SKB_SHARING; dev->priv_flags |= IFF_LIVE_ADDR_CHANGE; netif_keep_dst(dev);
From: William Dauchy w.dauchy@criteo.com
[ Upstream commit 5311a69aaca30fa849c3cc46fb25f75727fb72d0 ]
in the same manner as commit d0f418516022 ("net, ip_tunnel: fix namespaces move"), fix namespace moving as it was broken since commit 8d79266bc48c ("ip6_tunnel: add collect_md mode to IPv6 tunnel"), but for ipv6 this time; there is no reason to keep it for ip6_tunnel.
Fixes: 8d79266bc48c ("ip6_tunnel: add collect_md mode to IPv6 tunnel") Signed-off-by: William Dauchy w.dauchy@criteo.com Acked-by: Nicolas Dichtel nicolas.dichtel@6wind.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv6/ip6_tunnel.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-)
--- a/net/ipv6/ip6_tunnel.c +++ b/net/ipv6/ip6_tunnel.c @@ -1877,10 +1877,8 @@ static int ip6_tnl_dev_init(struct net_d if (err) return err; ip6_tnl_link_config(t); - if (t->parms.collect_md) { - dev->features |= NETIF_F_NETNS_LOCAL; + if (t->parms.collect_md) netif_keep_dst(dev); - } return 0; }
From: William Dauchy w.dauchy@criteo.com
[ Upstream commit d0f418516022c32ecceaf4275423e5bd3f8743a9 ]
in the same manner as commit 690afc165bb3 ("net: ip6_gre: fix moving ip6gre between namespaces"), fix namespace moving as it was broken since commit 2e15ea390e6f ("ip_gre: Add support to collect tunnel metadata."). Indeed, the ip6_gre commit removed the local flag for collect_md condition, so there is no reason to keep it for ip_gre/ip_tunnel.
this patch will fix both ip_tunnel and ip_gre modules.
Fixes: 2e15ea390e6f ("ip_gre: Add support to collect tunnel metadata.") Signed-off-by: William Dauchy w.dauchy@criteo.com Acked-by: Nicolas Dichtel nicolas.dichtel@6wind.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv4/ip_tunnel.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-)
--- a/net/ipv4/ip_tunnel.c +++ b/net/ipv4/ip_tunnel.c @@ -1236,10 +1236,8 @@ int ip_tunnel_init(struct net_device *de iph->version = 4; iph->ihl = 5;
- if (tunnel->collect_md) { - dev->features |= NETIF_F_NETNS_LOCAL; + if (tunnel->collect_md) netif_keep_dst(dev); - } return 0; } EXPORT_SYMBOL_GPL(ip_tunnel_init);
From: Eric Dumazet edumazet@google.com
[ Upstream commit d836f5c69d87473ff65c06a6123e5b2cf5e56f5b ]
rtnl_create_link() needs to apply dev->min_mtu and dev->max_mtu checks that we apply in do_setlink()
Otherwise malicious users can crash the kernel, for example after an integer overflow :
BUG: KASAN: use-after-free in memset include/linux/string.h:365 [inline] BUG: KASAN: use-after-free in __alloc_skb+0x37b/0x5e0 net/core/skbuff.c:238 Write of size 32 at addr ffff88819f20b9c0 by task swapper/0/0
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.5.0-rc1-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: <IRQ> __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x197/0x210 lib/dump_stack.c:118 print_address_description.constprop.0.cold+0xd4/0x30b mm/kasan/report.c:374 __kasan_report.cold+0x1b/0x41 mm/kasan/report.c:506 kasan_report+0x12/0x20 mm/kasan/common.c:639 check_memory_region_inline mm/kasan/generic.c:185 [inline] check_memory_region+0x134/0x1a0 mm/kasan/generic.c:192 memset+0x24/0x40 mm/kasan/common.c:108 memset include/linux/string.h:365 [inline] __alloc_skb+0x37b/0x5e0 net/core/skbuff.c:238 alloc_skb include/linux/skbuff.h:1049 [inline] alloc_skb_with_frags+0x93/0x590 net/core/skbuff.c:5664 sock_alloc_send_pskb+0x7ad/0x920 net/core/sock.c:2242 sock_alloc_send_skb+0x32/0x40 net/core/sock.c:2259 mld_newpack+0x1d7/0x7f0 net/ipv6/mcast.c:1609 add_grhead.isra.0+0x299/0x370 net/ipv6/mcast.c:1713 add_grec+0x7db/0x10b0 net/ipv6/mcast.c:1844 mld_send_cr net/ipv6/mcast.c:1970 [inline] mld_ifc_timer_expire+0x3d3/0x950 net/ipv6/mcast.c:2477 call_timer_fn+0x1ac/0x780 kernel/time/timer.c:1404 expire_timers kernel/time/timer.c:1449 [inline] __run_timers kernel/time/timer.c:1773 [inline] __run_timers kernel/time/timer.c:1740 [inline] run_timer_softirq+0x6c3/0x1790 kernel/time/timer.c:1786 __do_softirq+0x262/0x98c kernel/softirq.c:292 invoke_softirq kernel/softirq.c:373 [inline] irq_exit+0x19b/0x1e0 kernel/softirq.c:413 exiting_irq arch/x86/include/asm/apic.h:536 [inline] smp_apic_timer_interrupt+0x1a3/0x610 arch/x86/kernel/apic/apic.c:1137 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:829 </IRQ> RIP: 0010:native_safe_halt+0xe/0x10 arch/x86/include/asm/irqflags.h:61 Code: 98 6b ea f9 eb 8a cc cc cc cc cc cc e9 07 00 00 00 0f 00 2d 44 1c 60 00 f4 c3 66 90 e9 07 00 00 00 0f 00 2d 34 1c 60 00 fb f4 <c3> cc 55 48 89 e5 41 57 41 56 41 55 41 54 53 e8 4e 5d 9a f9 e8 79 RSP: 0018:ffffffff89807ce8 EFLAGS: 00000286 ORIG_RAX: ffffffffffffff13 RAX: 1ffffffff13266ae RBX: ffffffff8987a1c0 RCX: 0000000000000000 RDX: dffffc0000000000 RSI: 0000000000000006 RDI: ffffffff8987aa54 RBP: ffffffff89807d18 R08: ffffffff8987a1c0 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: dffffc0000000000 R13: ffffffff8a799980 R14: 0000000000000000 R15: 0000000000000000 arch_cpu_idle+0xa/0x10 arch/x86/kernel/process.c:690 default_idle_call+0x84/0xb0 kernel/sched/idle.c:94 cpuidle_idle_call kernel/sched/idle.c:154 [inline] do_idle+0x3c8/0x6e0 kernel/sched/idle.c:269 cpu_startup_entry+0x1b/0x20 kernel/sched/idle.c:361 rest_init+0x23b/0x371 init/main.c:451 arch_call_rest_init+0xe/0x1b start_kernel+0x904/0x943 init/main.c:784 x86_64_start_reservations+0x29/0x2b arch/x86/kernel/head64.c:490 x86_64_start_kernel+0x77/0x7b arch/x86/kernel/head64.c:471 secondary_startup_64+0xa4/0xb0 arch/x86/kernel/head_64.S:242
The buggy address belongs to the page: page:ffffea00067c82c0 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 raw: 057ffe0000000000 ffffea00067c82c8 ffffea00067c82c8 0000000000000000 raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000 page dumped because: kasan: bad access detected
Memory state around the buggy address: ffff88819f20b880: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ffff88819f20b900: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
ffff88819f20b980: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
^ ffff88819f20ba00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ffff88819f20ba80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
Fixes: 61e84623ace3 ("net: centralize net_device min/max MTU checking") Signed-off-by: Eric Dumazet edumazet@google.com Reported-by: syzbot syzkaller@googlegroups.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/netdevice.h | 2 ++ net/core/dev.c | 29 +++++++++++++++++++---------- net/core/rtnetlink.c | 13 +++++++++++-- 3 files changed, 32 insertions(+), 12 deletions(-)
--- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -3666,6 +3666,8 @@ int dev_set_alias(struct net_device *, c int dev_get_alias(const struct net_device *, char *, size_t); int dev_change_net_namespace(struct net_device *, struct net *, const char *); int __dev_set_mtu(struct net_device *, int); +int dev_validate_mtu(struct net_device *dev, int mtu, + struct netlink_ext_ack *extack); int dev_set_mtu_ext(struct net_device *dev, int mtu, struct netlink_ext_ack *extack); int dev_set_mtu(struct net_device *, int); --- a/net/core/dev.c +++ b/net/core/dev.c @@ -7973,6 +7973,22 @@ int __dev_set_mtu(struct net_device *dev } EXPORT_SYMBOL(__dev_set_mtu);
+int dev_validate_mtu(struct net_device *dev, int new_mtu, + struct netlink_ext_ack *extack) +{ + /* MTU must be positive, and in range */ + if (new_mtu < 0 || new_mtu < dev->min_mtu) { + NL_SET_ERR_MSG(extack, "mtu less than device minimum"); + return -EINVAL; + } + + if (dev->max_mtu > 0 && new_mtu > dev->max_mtu) { + NL_SET_ERR_MSG(extack, "mtu greater than device maximum"); + return -EINVAL; + } + return 0; +} + /** * dev_set_mtu_ext - Change maximum transfer unit * @dev: device @@ -7989,16 +8005,9 @@ int dev_set_mtu_ext(struct net_device *d if (new_mtu == dev->mtu) return 0;
- /* MTU must be positive, and in range */ - if (new_mtu < 0 || new_mtu < dev->min_mtu) { - NL_SET_ERR_MSG(extack, "mtu less than device minimum"); - return -EINVAL; - } - - if (dev->max_mtu > 0 && new_mtu > dev->max_mtu) { - NL_SET_ERR_MSG(extack, "mtu greater than device maximum"); - return -EINVAL; - } + err = dev_validate_mtu(dev, new_mtu, extack); + if (err) + return err;
if (!netif_device_present(dev)) return -ENODEV; --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -2959,8 +2959,17 @@ struct net_device *rtnl_create_link(stru dev->rtnl_link_ops = ops; dev->rtnl_link_state = RTNL_LINK_INITIALIZING;
- if (tb[IFLA_MTU]) - dev->mtu = nla_get_u32(tb[IFLA_MTU]); + if (tb[IFLA_MTU]) { + u32 mtu = nla_get_u32(tb[IFLA_MTU]); + int err; + + err = dev_validate_mtu(dev, mtu, extack); + if (err) { + free_netdev(dev); + return ERR_PTR(err); + } + dev->mtu = mtu; + } if (tb[IFLA_ADDRESS]) { memcpy(dev->dev_addr, nla_data(tb[IFLA_ADDRESS]), nla_len(tb[IFLA_ADDRESS]));
From: Cong Wang xiyou.wangcong@gmail.com
[ Upstream commit 61678d28d4a45ef376f5d02a839cc37509ae9281 ]
syzbot reported an out-of-bound access in em_nbyte. As initially analyzed by Eric, this is because em_nbyte sets its own em->datalen in em_nbyte_change() other than the one specified by user, but this value gets overwritten later by its caller tcf_em_validate(). We should leave em->datalen untouched to respect their choices.
I audit all the in-tree ematch users, all of those implement ->change() set em->datalen, so we can just avoid setting it twice in this case.
Reported-and-tested-by: syzbot+5af9a90dad568aa9f611@syzkaller.appspotmail.com Reported-by: syzbot+2f07903a5b05e7f36410@syzkaller.appspotmail.com Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Cc: Eric Dumazet eric.dumazet@gmail.com Signed-off-by: Cong Wang xiyou.wangcong@gmail.com Reviewed-by: Eric Dumazet edumazet@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/sched/ematch.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/net/sched/ematch.c +++ b/net/sched/ematch.c @@ -263,12 +263,12 @@ static int tcf_em_validate(struct tcf_pr } em->data = (unsigned long) v; } + em->datalen = data_len; } }
em->matchid = em_hdr->matchid; em->flags = em_hdr->flags; - em->datalen = data_len; em->net = net;
err = 0;
From: Eric Dumazet edumazet@google.com
[ Upstream commit 36d79af7fb59d6d9106feb9c1855eb93d6d53fe6 ]
sysbot found another issue in tc_new_tfilter(). We probably should use @name which contains the sanitized version of TCA_KIND.
BUG: KMSAN: uninit-value in string_nocheck lib/vsprintf.c:608 [inline] BUG: KMSAN: uninit-value in string+0x522/0x690 lib/vsprintf.c:689 CPU: 1 PID: 10753 Comm: syz-executor.1 Not tainted 5.5.0-rc5-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1c9/0x220 lib/dump_stack.c:118 kmsan_report+0xf7/0x1e0 mm/kmsan/kmsan_report.c:118 __msan_warning+0x58/0xa0 mm/kmsan/kmsan_instr.c:215 string_nocheck lib/vsprintf.c:608 [inline] string+0x522/0x690 lib/vsprintf.c:689 vsnprintf+0x207d/0x31b0 lib/vsprintf.c:2574 __request_module+0x2ad/0x11c0 kernel/kmod.c:143 tcf_proto_lookup_ops+0x241/0x720 net/sched/cls_api.c:139 tcf_proto_create net/sched/cls_api.c:262 [inline] tc_new_tfilter+0x2a4e/0x5010 net/sched/cls_api.c:2058 rtnetlink_rcv_msg+0xcb7/0x1570 net/core/rtnetlink.c:5415 netlink_rcv_skb+0x451/0x650 net/netlink/af_netlink.c:2477 rtnetlink_rcv+0x50/0x60 net/core/rtnetlink.c:5442 netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline] netlink_unicast+0xf9e/0x1100 net/netlink/af_netlink.c:1328 netlink_sendmsg+0x1248/0x14d0 net/netlink/af_netlink.c:1917 sock_sendmsg_nosec net/socket.c:639 [inline] sock_sendmsg net/socket.c:659 [inline] ____sys_sendmsg+0x12b6/0x1350 net/socket.c:2330 ___sys_sendmsg net/socket.c:2384 [inline] __sys_sendmsg+0x451/0x5f0 net/socket.c:2417 __do_sys_sendmsg net/socket.c:2426 [inline] __se_sys_sendmsg+0x97/0xb0 net/socket.c:2424 __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2424 do_syscall_64+0xb8/0x160 arch/x86/entry/common.c:296 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x45b349 Code: ad b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 7b b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007f88b3948c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 00007f88b39496d4 RCX: 000000000045b349 RDX: 0000000000000000 RSI: 00000000200001c0 RDI: 0000000000000003 RBP: 000000000075bfc8 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff R13: 000000000000099f R14: 00000000004cb163 R15: 000000000075bfd4
Uninit was created at: kmsan_save_stack_with_flags mm/kmsan/kmsan.c:144 [inline] kmsan_internal_poison_shadow+0x66/0xd0 mm/kmsan/kmsan.c:127 kmsan_slab_alloc+0x8a/0xe0 mm/kmsan/kmsan_hooks.c:82 slab_alloc_node mm/slub.c:2774 [inline] __kmalloc_node_track_caller+0xb40/0x1200 mm/slub.c:4382 __kmalloc_reserve net/core/skbuff.c:141 [inline] __alloc_skb+0x2fd/0xac0 net/core/skbuff.c:209 alloc_skb include/linux/skbuff.h:1049 [inline] netlink_alloc_large_skb net/netlink/af_netlink.c:1174 [inline] netlink_sendmsg+0x7d3/0x14d0 net/netlink/af_netlink.c:1892 sock_sendmsg_nosec net/socket.c:639 [inline] sock_sendmsg net/socket.c:659 [inline] ____sys_sendmsg+0x12b6/0x1350 net/socket.c:2330 ___sys_sendmsg net/socket.c:2384 [inline] __sys_sendmsg+0x451/0x5f0 net/socket.c:2417 __do_sys_sendmsg net/socket.c:2426 [inline] __se_sys_sendmsg+0x97/0xb0 net/socket.c:2424 __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2424 do_syscall_64+0xb8/0x160 arch/x86/entry/common.c:296 entry_SYSCALL_64_after_hwframe+0x44/0xa9
Fixes: 6f96c3c6904c ("net_sched: fix backward compatibility for TCA_KIND") Signed-off-by: Eric Dumazet edumazet@google.com Reported-by: syzbot syzkaller@googlegroups.com Cc: Cong Wang xiyou.wangcong@gmail.com Cc: Marcelo Ricardo Leitner marcelo.leitner@gmail.com Cc: Jamal Hadi Salim jhs@mojatatu.com Cc: Jiri Pirko jiri@resnulli.us Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/sched/cls_api.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-)
--- a/net/sched/cls_api.c +++ b/net/sched/cls_api.c @@ -2055,9 +2055,8 @@ replay: &chain_info));
mutex_unlock(&chain->filter_chain_lock); - tp_new = tcf_proto_create(nla_data(tca[TCA_KIND]), - protocol, prio, chain, rtnl_held, - extack); + tp_new = tcf_proto_create(name, protocol, prio, chain, + rtnl_held, extack); if (IS_ERR(tp_new)) { err = PTR_ERR(tp_new); goto errout_tp;
From: Jouni Hogander jouni.hogander@unikie.com
[ Upstream commit cb626bf566eb4433318d35681286c494f04fedcc ]
Netdev_register_kobject is calling device_initialize. In case of error reference taken by device_initialize is not given up.
Drivers are supposed to call free_netdev in case of error. In non-error case the last reference is given up there and device release sequence is triggered. In error case this reference is kept and the release sequence is never started.
Fix this by setting reg_state as NETREG_UNREGISTERED if registering fails.
This is the rootcause for couple of memory leaks reported by Syzkaller:
BUG: memory leak unreferenced object 0xffff8880675ca008 (size 256): comm "netdev_register", pid 281, jiffies 4294696663 (age 6.808s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<0000000058ca4711>] kmem_cache_alloc_trace+0x167/0x280 [<000000002340019b>] device_add+0x882/0x1750 [<000000001d588c3a>] netdev_register_kobject+0x128/0x380 [<0000000011ef5535>] register_netdevice+0xa1b/0xf00 [<000000007fcf1c99>] __tun_chr_ioctl+0x20d5/0x3dd0 [<000000006a5b7b2b>] tun_chr_ioctl+0x2f/0x40 [<00000000f30f834a>] do_vfs_ioctl+0x1c7/0x1510 [<00000000fba062ea>] ksys_ioctl+0x99/0xb0 [<00000000b1c1b8d2>] __x64_sys_ioctl+0x78/0xb0 [<00000000984cabb9>] do_syscall_64+0x16f/0x580 [<000000000bde033d>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [<00000000e6ca2d9f>] 0xffffffffffffffff
BUG: memory leak unreferenced object 0xffff8880668ba588 (size 8): comm "kobject_set_nam", pid 286, jiffies 4294725297 (age 9.871s) hex dump (first 8 bytes): 6e 72 30 00 cc be df 2b nr0....+ backtrace: [<00000000a322332a>] __kmalloc_track_caller+0x16e/0x290 [<00000000236fd26b>] kstrdup+0x3e/0x70 [<00000000dd4a2815>] kstrdup_const+0x3e/0x50 [<0000000049a377fc>] kvasprintf_const+0x10e/0x160 [<00000000627fc711>] kobject_set_name_vargs+0x5b/0x140 [<0000000019eeab06>] dev_set_name+0xc0/0xf0 [<0000000069cb12bc>] netdev_register_kobject+0xc8/0x320 [<00000000f2e83732>] register_netdevice+0xa1b/0xf00 [<000000009e1f57cc>] __tun_chr_ioctl+0x20d5/0x3dd0 [<000000009c560784>] tun_chr_ioctl+0x2f/0x40 [<000000000d759e02>] do_vfs_ioctl+0x1c7/0x1510 [<00000000351d7c31>] ksys_ioctl+0x99/0xb0 [<000000008390040a>] __x64_sys_ioctl+0x78/0xb0 [<0000000052d196b7>] do_syscall_64+0x16f/0x580 [<0000000019af9236>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [<00000000bc384531>] 0xffffffffffffffff
v3 -> v4: Set reg_state to NETREG_UNREGISTERED if registering fails
v2 -> v3: * Replaced BUG_ON with WARN_ON in free_netdev and netdev_release
v1 -> v2: * Relying on driver calling free_netdev rather than calling put_device directly in error path
Reported-by: syzbot+ad8ca40ecd77896d51e2@syzkaller.appspotmail.com Cc: David Miller davem@davemloft.net Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: Lukas Bulwahn lukas.bulwahn@gmail.com Signed-off-by: Jouni Hogander jouni.hogander@unikie.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/core/dev.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/net/core/dev.c +++ b/net/core/dev.c @@ -9082,8 +9082,10 @@ int register_netdevice(struct net_device goto err_uninit;
ret = netdev_register_kobject(dev); - if (ret) + if (ret) { + dev->reg_state = NETREG_UNREGISTERED; goto err_uninit; + } dev->reg_state = NETREG_REGISTERED;
__netdev_update_features(dev);
From: James Hughes james.hughes@raspberrypi.org
[ Upstream commit ce896476c65d72b4b99fa09c2f33436b4198f034 ]
As reported by Eric Dumazet, there are still some outstanding cases where the driver does not handle TSO correctly when skb's are over a certain size. Most cases have been fixed, this patch should ensure that forwarded SKB's that are greater than MAX_SINGLE_PACKET_SIZE - TX_OVERHEAD are software segmented and handled correctly.
Signed-off-by: James Hughes james.hughes@raspberrypi.org Reviewed-by: Eric Dumazet edumazet@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/usb/lan78xx.c | 15 +++++++++++++++ 1 file changed, 15 insertions(+)
--- a/drivers/net/usb/lan78xx.c +++ b/drivers/net/usb/lan78xx.c @@ -20,6 +20,7 @@ #include <linux/mdio.h> #include <linux/phy.h> #include <net/ip6_checksum.h> +#include <net/vxlan.h> #include <linux/interrupt.h> #include <linux/irqdomain.h> #include <linux/irq.h> @@ -3668,6 +3669,19 @@ static void lan78xx_tx_timeout(struct ne tasklet_schedule(&dev->bh); }
+static netdev_features_t lan78xx_features_check(struct sk_buff *skb, + struct net_device *netdev, + netdev_features_t features) +{ + if (skb->len + TX_OVERHEAD > MAX_SINGLE_PACKET_SIZE) + features &= ~NETIF_F_GSO_MASK; + + features = vlan_features_check(skb, features); + features = vxlan_features_check(skb, features); + + return features; +} + static const struct net_device_ops lan78xx_netdev_ops = { .ndo_open = lan78xx_open, .ndo_stop = lan78xx_stop, @@ -3681,6 +3695,7 @@ static const struct net_device_ops lan78 .ndo_set_features = lan78xx_set_features, .ndo_vlan_rx_add_vid = lan78xx_vlan_rx_add_vid, .ndo_vlan_rx_kill_vid = lan78xx_vlan_rx_kill_vid, + .ndo_features_check = lan78xx_features_check, };
static void lan78xx_stat_monitor(struct timer_list *t)
From: Paolo Abeni pabeni@redhat.com
[ Upstream commit d39ca2590d10712f412add7a88e1dd467a7246f4 ]
This reverts commit 0d4a6608f68c7532dcbfec2ea1150c9761767d03.
Willem reported that after commit 0d4a6608f68c ("udp: do rmem bulk free even if the rx sk queue is empty") the memory allocated by an almost idle system with many UDP sockets can grow a lot.
For stable kernel keep the solution as simple as possible and revert the offending commit.
Reported-by: Willem de Bruijn willemdebruijn.kernel@gmail.com Diagnosed-by: Eric Dumazet eric.dumazet@gmail.com Fixes: 0d4a6608f68c ("udp: do rmem bulk free even if the rx sk queue is empty") Signed-off-by: Paolo Abeni pabeni@redhat.com Acked-by: Willem de Bruijn willemb@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv4/udp.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -1368,7 +1368,8 @@ static void udp_rmem_release(struct sock if (likely(partial)) { up->forward_deficit += size; size = up->forward_deficit; - if (size < (sk->sk_rcvbuf >> 2)) + if (size < (sk->sk_rcvbuf >> 2) && + !skb_queue_empty(&up->reader_queue)) return; } else { size += up->forward_deficit;
From: Wen Yang wenyang@linux.alibaba.com
[ Upstream commit 5b2f1f3070b6447b76174ea8bfb7390dc6253ebd ]
do_div() does a 64-by-32 division. Use div64_long() instead of it if the divisor is long, to avoid truncation to 32-bit. And as a nice side effect also cleans up the function a bit.
Signed-off-by: Wen Yang wenyang@linux.alibaba.com Cc: Eric Dumazet edumazet@google.com Cc: "David S. Miller" davem@davemloft.net Cc: Alexey Kuznetsov kuznet@ms2.inr.ac.ru Cc: Hideaki YOSHIFUJI yoshfuji@linux-ipv6.org Cc: netdev@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Eric Dumazet edumazet@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv4/tcp_bbr.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
--- a/net/ipv4/tcp_bbr.c +++ b/net/ipv4/tcp_bbr.c @@ -779,8 +779,7 @@ static void bbr_update_bw(struct sock *s * bandwidth sample. Delivered is in packets and interval_us in uS and * ratio will be <<1 for most connections. So delivered is first scaled. */ - bw = (u64)rs->delivered * BW_UNIT; - do_div(bw, rs->interval_us); + bw = div64_long((u64)rs->delivered * BW_UNIT, rs->interval_us);
/* If this sample is application-limited, it is likely to have a very * low delivered count that represents application behavior rather than
From: Eric Dumazet edumazet@google.com
[ Upstream commit 2bec445f9bf35e52e395b971df48d3e1e5dc704a ]
Latest commit 853697504de0 ("tcp: Fix highest_sack and highest_sack_seq") apparently allowed syzbot to trigger various crashes in TCP stack [1]
I believe this commit only made things easier for syzbot to find its way into triggering use-after-frees. But really the bugs could lead to bad TCP behavior or even plain crashes even for non malicious peers.
I have audited all calls to tcp_rtx_queue_unlink() and tcp_rtx_queue_unlink_and_free() and made sure tp->highest_sack would be updated if we are removing from rtx queue the skb that tp->highest_sack points to.
These updates were missing in three locations :
1) tcp_clean_rtx_queue() [This one seems quite serious, I have no idea why this was not caught earlier]
2) tcp_rtx_queue_purge() [Probably not a big deal for normal operations]
3) tcp_send_synack() [Probably not a big deal for normal operations]
[1] BUG: KASAN: use-after-free in tcp_highest_sack_seq include/net/tcp.h:1864 [inline] BUG: KASAN: use-after-free in tcp_highest_sack_seq include/net/tcp.h:1856 [inline] BUG: KASAN: use-after-free in tcp_check_sack_reordering+0x33c/0x3a0 net/ipv4/tcp_input.c:891 Read of size 4 at addr ffff8880a488d068 by task ksoftirqd/1/16
CPU: 1 PID: 16 Comm: ksoftirqd/1 Not tainted 5.5.0-rc5-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x197/0x210 lib/dump_stack.c:118 print_address_description.constprop.0.cold+0xd4/0x30b mm/kasan/report.c:374 __kasan_report.cold+0x1b/0x41 mm/kasan/report.c:506 kasan_report+0x12/0x20 mm/kasan/common.c:639 __asan_report_load4_noabort+0x14/0x20 mm/kasan/generic_report.c:134 tcp_highest_sack_seq include/net/tcp.h:1864 [inline] tcp_highest_sack_seq include/net/tcp.h:1856 [inline] tcp_check_sack_reordering+0x33c/0x3a0 net/ipv4/tcp_input.c:891 tcp_try_undo_partial net/ipv4/tcp_input.c:2730 [inline] tcp_fastretrans_alert+0xf74/0x23f0 net/ipv4/tcp_input.c:2847 tcp_ack+0x2577/0x5bf0 net/ipv4/tcp_input.c:3710 tcp_rcv_established+0x6dd/0x1e90 net/ipv4/tcp_input.c:5706 tcp_v4_do_rcv+0x619/0x8d0 net/ipv4/tcp_ipv4.c:1619 tcp_v4_rcv+0x307f/0x3b40 net/ipv4/tcp_ipv4.c:2001 ip_protocol_deliver_rcu+0x5a/0x880 net/ipv4/ip_input.c:204 ip_local_deliver_finish+0x23b/0x380 net/ipv4/ip_input.c:231 NF_HOOK include/linux/netfilter.h:307 [inline] NF_HOOK include/linux/netfilter.h:301 [inline] ip_local_deliver+0x1e9/0x520 net/ipv4/ip_input.c:252 dst_input include/net/dst.h:442 [inline] ip_rcv_finish+0x1db/0x2f0 net/ipv4/ip_input.c:428 NF_HOOK include/linux/netfilter.h:307 [inline] NF_HOOK include/linux/netfilter.h:301 [inline] ip_rcv+0xe8/0x3f0 net/ipv4/ip_input.c:538 __netif_receive_skb_one_core+0x113/0x1a0 net/core/dev.c:5148 __netif_receive_skb+0x2c/0x1d0 net/core/dev.c:5262 process_backlog+0x206/0x750 net/core/dev.c:6093 napi_poll net/core/dev.c:6530 [inline] net_rx_action+0x508/0x1120 net/core/dev.c:6598 __do_softirq+0x262/0x98c kernel/softirq.c:292 run_ksoftirqd kernel/softirq.c:603 [inline] run_ksoftirqd+0x8e/0x110 kernel/softirq.c:595 smpboot_thread_fn+0x6a3/0xa40 kernel/smpboot.c:165 kthread+0x361/0x430 kernel/kthread.c:255 ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352
Allocated by task 10091: save_stack+0x23/0x90 mm/kasan/common.c:72 set_track mm/kasan/common.c:80 [inline] __kasan_kmalloc mm/kasan/common.c:513 [inline] __kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:486 kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:521 slab_post_alloc_hook mm/slab.h:584 [inline] slab_alloc_node mm/slab.c:3263 [inline] kmem_cache_alloc_node+0x138/0x740 mm/slab.c:3575 __alloc_skb+0xd5/0x5e0 net/core/skbuff.c:198 alloc_skb_fclone include/linux/skbuff.h:1099 [inline] sk_stream_alloc_skb net/ipv4/tcp.c:875 [inline] sk_stream_alloc_skb+0x113/0xc90 net/ipv4/tcp.c:852 tcp_sendmsg_locked+0xcf9/0x3470 net/ipv4/tcp.c:1282 tcp_sendmsg+0x30/0x50 net/ipv4/tcp.c:1432 inet_sendmsg+0x9e/0xe0 net/ipv4/af_inet.c:807 sock_sendmsg_nosec net/socket.c:652 [inline] sock_sendmsg+0xd7/0x130 net/socket.c:672 __sys_sendto+0x262/0x380 net/socket.c:1998 __do_sys_sendto net/socket.c:2010 [inline] __se_sys_sendto net/socket.c:2006 [inline] __x64_sys_sendto+0xe1/0x1a0 net/socket.c:2006 do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294 entry_SYSCALL_64_after_hwframe+0x49/0xbe
Freed by task 10095: save_stack+0x23/0x90 mm/kasan/common.c:72 set_track mm/kasan/common.c:80 [inline] kasan_set_free_info mm/kasan/common.c:335 [inline] __kasan_slab_free+0x102/0x150 mm/kasan/common.c:474 kasan_slab_free+0xe/0x10 mm/kasan/common.c:483 __cache_free mm/slab.c:3426 [inline] kmem_cache_free+0x86/0x320 mm/slab.c:3694 kfree_skbmem+0x178/0x1c0 net/core/skbuff.c:645 __kfree_skb+0x1e/0x30 net/core/skbuff.c:681 sk_eat_skb include/net/sock.h:2453 [inline] tcp_recvmsg+0x1252/0x2930 net/ipv4/tcp.c:2166 inet_recvmsg+0x136/0x610 net/ipv4/af_inet.c:838 sock_recvmsg_nosec net/socket.c:886 [inline] sock_recvmsg net/socket.c:904 [inline] sock_recvmsg+0xce/0x110 net/socket.c:900 __sys_recvfrom+0x1ff/0x350 net/socket.c:2055 __do_sys_recvfrom net/socket.c:2073 [inline] __se_sys_recvfrom net/socket.c:2069 [inline] __x64_sys_recvfrom+0xe1/0x1a0 net/socket.c:2069 do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294 entry_SYSCALL_64_after_hwframe+0x49/0xbe
The buggy address belongs to the object at ffff8880a488d040 which belongs to the cache skbuff_fclone_cache of size 456 The buggy address is located 40 bytes inside of 456-byte region [ffff8880a488d040, ffff8880a488d208) The buggy address belongs to the page: page:ffffea0002922340 refcount:1 mapcount:0 mapping:ffff88821b057000 index:0x0 raw: 00fffe0000000200 ffffea00022a5788 ffffea0002624a48 ffff88821b057000 raw: 0000000000000000 ffff8880a488d040 0000000100000006 0000000000000000 page dumped because: kasan: bad access detected
Memory state around the buggy address: ffff8880a488cf00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff8880a488cf80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8880a488d000: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
^ ffff8880a488d080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff8880a488d100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
Fixes: 853697504de0 ("tcp: Fix highest_sack and highest_sack_seq") Fixes: 50895b9de1d3 ("tcp: highest_sack fix") Fixes: 737ff314563c ("tcp: use sequence distance to detect reordering") Signed-off-by: Eric Dumazet edumazet@google.com Cc: Cambda Zhu cambda@linux.alibaba.com Cc: Yuchung Cheng ycheng@google.com Cc: Neal Cardwell ncardwell@google.com Acked-by: Neal Cardwell ncardwell@google.com Acked-by: Yuchung Cheng ycheng@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv4/tcp.c | 1 + net/ipv4/tcp_input.c | 1 + net/ipv4/tcp_output.c | 1 + 3 files changed, 3 insertions(+)
--- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -2520,6 +2520,7 @@ static void tcp_rtx_queue_purge(struct s { struct rb_node *p = rb_first(&sk->tcp_rtx_queue);
+ tcp_sk(sk)->highest_sack = NULL; while (p) { struct sk_buff *skb = rb_to_skb(p);
--- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -3164,6 +3164,7 @@ static int tcp_clean_rtx_queue(struct so tp->retransmit_skb_hint = NULL; if (unlikely(skb == tp->lost_skb_hint)) tp->lost_skb_hint = NULL; + tcp_highest_sack_replace(sk, skb, next); tcp_rtx_queue_unlink_and_free(skb, sk); }
--- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -3231,6 +3231,7 @@ int tcp_send_synack(struct sock *sk) if (!nskb) return -ENOMEM; INIT_LIST_HEAD(&nskb->tcp_tsorted_anchor); + tcp_highest_sack_replace(sk, skb, nskb); tcp_rtx_queue_unlink_and_free(skb, sk); __skb_header_release(nskb); tcp_rbtree_insert(&sk->tcp_rtx_queue, nskb);
From: Eric Dumazet edumazet@google.com
[ Upstream commit 1efba987c48629c0c64703bb4ea76ca1a3771d17 ]
If both IFF_NAPI_FRAGS mode and XDP are enabled, and the XDP program consumes the skb, we need to clear the napi.skb (or risk a use-after-free) and release the mutex (or risk a deadlock)
WARNING: lock held when returning to user space! 5.5.0-rc6-syzkaller #0 Not tainted ------------------------------------------------ syz-executor.0/455 is leaving the kernel with locks still held! 1 lock held by syz-executor.0/455: #0: ffff888098f6e748 (&tfile->napi_mutex){+.+.}, at: tun_get_user+0x1604/0x3fc0 drivers/net/tun.c:1835
Fixes: 90e33d459407 ("tun: enable napi_gro_frags() for TUN/TAP driver") Signed-off-by: Eric Dumazet edumazet@google.com Reported-by: syzbot syzkaller@googlegroups.com Cc: Petar Penkov ppenkov@google.com Cc: Willem de Bruijn willemb@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/tun.c | 4 ++++ 1 file changed, 4 insertions(+)
--- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -1936,6 +1936,10 @@ drop: if (ret != XDP_PASS) { rcu_read_unlock(); local_bh_enable(); + if (frags) { + tfile->napi.skb = NULL; + mutex_unlock(&tfile->napi_mutex); + } return total_len; } }
From: Michael Ellerman mpe@ellerman.id.au
[ Upstream commit d6bce2137f5d6bb1093e96d2f801479099b28094 ]
The driver for Cisco Aironet 4500 and 4800 series cards (airo.c), implements AIROOLDIOCTL/SIOCDEVPRIVATE in airo_ioctl().
The ioctl handler copies an aironet_ioctl struct from userspace, which includes a command and a length. Some of the commands are handled in readrids(), which kmalloc()'s a buffer of RIDSIZE (2048) bytes.
That buffer is then passed to PC4500_readrid(), which has two cases. The else case does some setup and then reads up to RIDSIZE bytes from the hardware into the kmalloc()'ed buffer.
Here len == RIDSIZE, pBuf is the kmalloc()'ed buffer:
// read the rid length field bap_read(ai, pBuf, 2, BAP1); // length for remaining part of rid len = min(len, (int)le16_to_cpu(*(__le16*)pBuf)) - 2; ... // read remainder of the rid rc = bap_read(ai, ((__le16*)pBuf)+1, len, BAP1);
PC4500_readrid() then returns to readrids() which does:
len = comp->len; if (copy_to_user(comp->data, iobuf, min(len, (int)RIDSIZE))) {
Where comp->len is the user controlled length field.
So if the "rid length field" returned by the hardware is < 2048, and the user requests 2048 bytes in comp->len, we will leak the previous contents of the kmalloc()'ed buffer to userspace.
Fix it by kzalloc()'ing the buffer.
Found by Ilja by code inspection, not tested as I don't have the required hardware.
Reported-by: Ilja Van Sprundel ivansprundel@ioactive.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/wireless/cisco/airo.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/net/wireless/cisco/airo.c +++ b/drivers/net/wireless/cisco/airo.c @@ -7813,7 +7813,7 @@ static int readrids(struct net_device *d return -EINVAL; }
- if ((iobuf = kmalloc(RIDSIZE, GFP_KERNEL)) == NULL) + if ((iobuf = kzalloc(RIDSIZE, GFP_KERNEL)) == NULL) return -ENOMEM;
PC4500_readrid(ai,ridcode,iobuf,RIDSIZE, 1);
From: Michael Ellerman mpe@ellerman.id.au
[ Upstream commit 78f7a7566f5eb59321e99b55a6fdb16ea05b37d1 ]
The driver for Cisco Aironet 4500 and 4800 series cards (airo.c), implements AIROOLDIOCTL/SIOCDEVPRIVATE in airo_ioctl().
The ioctl handler copies an aironet_ioctl struct from userspace, which includes a command. Some of the commands are handled in readrids(), where the user controlled command is converted into a driver-internal value called "ridcode".
There are two command values, AIROGWEPKTMP and AIROGWEPKNV, which correspond to ridcode values of RID_WEP_TEMP and RID_WEP_PERM respectively. These commands both have checks that the user has CAP_NET_ADMIN, with the comment that "Only super-user can read WEP keys", otherwise they return -EPERM.
However there is another command value, AIRORRID, that lets the user specify the ridcode value directly, with no other checks. This means the user can bypass the CAP_NET_ADMIN check on AIROGWEPKTMP and AIROGWEPKNV.
Fix it by moving the CAP_NET_ADMIN check out of the command handling and instead do it later based on the ridcode. That way regardless of whether the ridcode is set via AIROGWEPKTMP or AIROGWEPKNV, or passed in using AIRORID, we always do the CAP_NET_ADMIN check.
Found by Ilja by code inspection, not tested as I don't have the required hardware.
Reported-by: Ilja Van Sprundel ivansprundel@ioactive.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/wireless/cisco/airo.c | 18 ++++++++---------- 1 file changed, 8 insertions(+), 10 deletions(-)
--- a/drivers/net/wireless/cisco/airo.c +++ b/drivers/net/wireless/cisco/airo.c @@ -7790,16 +7790,8 @@ static int readrids(struct net_device *d case AIROGVLIST: ridcode = RID_APLIST; break; case AIROGDRVNAM: ridcode = RID_DRVNAME; break; case AIROGEHTENC: ridcode = RID_ETHERENCAP; break; - case AIROGWEPKTMP: ridcode = RID_WEP_TEMP; - /* Only super-user can read WEP keys */ - if (!capable(CAP_NET_ADMIN)) - return -EPERM; - break; - case AIROGWEPKNV: ridcode = RID_WEP_PERM; - /* Only super-user can read WEP keys */ - if (!capable(CAP_NET_ADMIN)) - return -EPERM; - break; + case AIROGWEPKTMP: ridcode = RID_WEP_TEMP; break; + case AIROGWEPKNV: ridcode = RID_WEP_PERM; break; case AIROGSTAT: ridcode = RID_STATUS; break; case AIROGSTATSD32: ridcode = RID_STATSDELTA; break; case AIROGSTATSC32: ridcode = RID_STATS; break; @@ -7813,6 +7805,12 @@ static int readrids(struct net_device *d return -EINVAL; }
+ if (ridcode == RID_WEP_TEMP || ridcode == RID_WEP_PERM) { + /* Only super-user can read WEP keys */ + if (!capable(CAP_NET_ADMIN)) + return -EPERM; + } + if ((iobuf = kzalloc(RIDSIZE, GFP_KERNEL)) == NULL) return -ENOMEM;
From: Ido Schimmel idosch@mellanox.com
[ Upstream commit 971de2e572118c1128bff295341e37b6c8b8f108 ]
During reload (or module unload), the router block is de-initialized. Among other things, this results in the removal of a default multicast route from each active virtual router (VRF). These default routes are configured during initialization to trap packets to the CPU. In Spectrum-2, unlike Spectrum-1, multicast routes are implemented using ACL rules.
Since the router block is de-initialized before the ACL block, it is possible that the ACL rules corresponding to the default routes are deleted while being accessed by the ACL delayed work that queries rules' activity from the device. This can result in a rare use-after-free [1].
Fix this by protecting the rules list accessed by the delayed work with a lock. We cannot use a spinlock as the activity read operation is blocking.
[1] [ 123.331662] ================================================================== [ 123.339920] BUG: KASAN: use-after-free in mlxsw_sp_acl_rule_activity_update_work+0x330/0x3b0 [ 123.349381] Read of size 8 at addr ffff8881f3bb4520 by task kworker/0:2/78 [ 123.357080] [ 123.358773] CPU: 0 PID: 78 Comm: kworker/0:2 Not tainted 5.5.0-rc5-custom-33108-gf5df95d3ef41 #2209 [ 123.368898] Hardware name: Mellanox Technologies Ltd. MSN3700C/VMOD0008, BIOS 5.11 10/10/2018 [ 123.378456] Workqueue: mlxsw_core mlxsw_sp_acl_rule_activity_update_work [ 123.385970] Call Trace: [ 123.388734] dump_stack+0xc6/0x11e [ 123.392568] print_address_description.constprop.4+0x21/0x340 [ 123.403236] __kasan_report.cold.8+0x76/0xb1 [ 123.414884] kasan_report+0xe/0x20 [ 123.418716] mlxsw_sp_acl_rule_activity_update_work+0x330/0x3b0 [ 123.444034] process_one_work+0xb06/0x19a0 [ 123.453731] worker_thread+0x91/0xe90 [ 123.467348] kthread+0x348/0x410 [ 123.476847] ret_from_fork+0x24/0x30 [ 123.480863] [ 123.482545] Allocated by task 73: [ 123.486273] save_stack+0x19/0x80 [ 123.490000] __kasan_kmalloc.constprop.6+0xc1/0xd0 [ 123.495379] mlxsw_sp_acl_rule_create+0xa7/0x230 [ 123.500566] mlxsw_sp2_mr_tcam_route_create+0xf6/0x3e0 [ 123.506334] mlxsw_sp_mr_tcam_route_create+0x5b4/0x820 [ 123.512102] mlxsw_sp_mr_table_create+0x3b5/0x690 [ 123.517389] mlxsw_sp_vr_get+0x289/0x4d0 [ 123.521797] mlxsw_sp_fib_node_get+0xa2/0x990 [ 123.526692] mlxsw_sp_router_fib4_event_work+0x54c/0x2d60 [ 123.532752] process_one_work+0xb06/0x19a0 [ 123.537352] worker_thread+0x91/0xe90 [ 123.541471] kthread+0x348/0x410 [ 123.545103] ret_from_fork+0x24/0x30 [ 123.549113] [ 123.550795] Freed by task 518: [ 123.554231] save_stack+0x19/0x80 [ 123.557958] __kasan_slab_free+0x125/0x170 [ 123.562556] kfree+0xd7/0x3a0 [ 123.565895] mlxsw_sp_acl_rule_destroy+0x63/0xd0 [ 123.571081] mlxsw_sp2_mr_tcam_route_destroy+0xd5/0x130 [ 123.576946] mlxsw_sp_mr_tcam_route_destroy+0xba/0x260 [ 123.582714] mlxsw_sp_mr_table_destroy+0x1ab/0x290 [ 123.588091] mlxsw_sp_vr_put+0x1db/0x350 [ 123.592496] mlxsw_sp_fib_node_put+0x298/0x4c0 [ 123.597486] mlxsw_sp_vr_fib_flush+0x15b/0x360 [ 123.602476] mlxsw_sp_router_fib_flush+0xba/0x470 [ 123.607756] mlxsw_sp_vrs_fini+0xaa/0x120 [ 123.612260] mlxsw_sp_router_fini+0x137/0x384 [ 123.617152] mlxsw_sp_fini+0x30a/0x4a0 [ 123.621374] mlxsw_core_bus_device_unregister+0x159/0x600 [ 123.627435] mlxsw_devlink_core_bus_device_reload_down+0x7e/0xb0 [ 123.634176] devlink_reload+0xb4/0x380 [ 123.638391] devlink_nl_cmd_reload+0x610/0x700 [ 123.643382] genl_rcv_msg+0x6a8/0xdc0 [ 123.647497] netlink_rcv_skb+0x134/0x3a0 [ 123.651904] genl_rcv+0x29/0x40 [ 123.655436] netlink_unicast+0x4d4/0x700 [ 123.659843] netlink_sendmsg+0x7c0/0xc70 [ 123.664251] __sys_sendto+0x265/0x3c0 [ 123.668367] __x64_sys_sendto+0xe2/0x1b0 [ 123.672773] do_syscall_64+0xa0/0x530 [ 123.676892] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 123.682552] [ 123.684238] The buggy address belongs to the object at ffff8881f3bb4500 [ 123.684238] which belongs to the cache kmalloc-128 of size 128 [ 123.698261] The buggy address is located 32 bytes inside of [ 123.698261] 128-byte region [ffff8881f3bb4500, ffff8881f3bb4580) [ 123.711303] The buggy address belongs to the page: [ 123.716682] page:ffffea0007ceed00 refcount:1 mapcount:0 mapping:ffff888236403500 index:0x0 [ 123.725958] raw: 0200000000000200 dead000000000100 dead000000000122 ffff888236403500 [ 123.734646] raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000 [ 123.743315] page dumped because: kasan: bad access detected [ 123.749562] [ 123.751241] Memory state around the buggy address: [ 123.756620] ffff8881f3bb4400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 123.764716] ffff8881f3bb4480: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 123.772812] >ffff8881f3bb4500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 123.780904] ^ [ 123.785697] ffff8881f3bb4580: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 123.793793] ffff8881f3bb4600: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 123.801883] ==================================================================
Fixes: cf7221a4f5a5 ("mlxsw: spectrum_router: Add Multicast routing support for Spectrum-2") Signed-off-by: Ido Schimmel idosch@mellanox.com Acked-by: Jiri Pirko jiri@mellanox.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/mellanox/mlxsw/spectrum_acl.c | 16 ++++++++++++---- 1 file changed, 12 insertions(+), 4 deletions(-)
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl.c +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl.c @@ -8,6 +8,7 @@ #include <linux/string.h> #include <linux/rhashtable.h> #include <linux/netdevice.h> +#include <linux/mutex.h> #include <net/net_namespace.h> #include <net/tc_act/tc_vlan.h>
@@ -25,6 +26,7 @@ struct mlxsw_sp_acl { struct mlxsw_sp_fid *dummy_fid; struct rhashtable ruleset_ht; struct list_head rules; + struct mutex rules_lock; /* Protects rules list */ struct { struct delayed_work dw; unsigned long interval; /* ms */ @@ -701,7 +703,9 @@ int mlxsw_sp_acl_rule_add(struct mlxsw_s goto err_ruleset_block_bind; }
+ mutex_lock(&mlxsw_sp->acl->rules_lock); list_add_tail(&rule->list, &mlxsw_sp->acl->rules); + mutex_unlock(&mlxsw_sp->acl->rules_lock); block->rule_count++; block->egress_blocker_rule_count += rule->rulei->egress_bind_blocker; return 0; @@ -723,7 +727,9 @@ void mlxsw_sp_acl_rule_del(struct mlxsw_
block->egress_blocker_rule_count -= rule->rulei->egress_bind_blocker; ruleset->ht_key.block->rule_count--; + mutex_lock(&mlxsw_sp->acl->rules_lock); list_del(&rule->list); + mutex_unlock(&mlxsw_sp->acl->rules_lock); if (!ruleset->ht_key.chain_index && mlxsw_sp_acl_ruleset_is_singular(ruleset)) mlxsw_sp_acl_ruleset_block_unbind(mlxsw_sp, ruleset, @@ -783,19 +789,18 @@ static int mlxsw_sp_acl_rules_activity_u struct mlxsw_sp_acl_rule *rule; int err;
- /* Protect internal structures from changes */ - rtnl_lock(); + mutex_lock(&acl->rules_lock); list_for_each_entry(rule, &acl->rules, list) { err = mlxsw_sp_acl_rule_activity_update(acl->mlxsw_sp, rule); if (err) goto err_rule_update; } - rtnl_unlock(); + mutex_unlock(&acl->rules_lock); return 0;
err_rule_update: - rtnl_unlock(); + mutex_unlock(&acl->rules_lock); return err; }
@@ -880,6 +885,7 @@ int mlxsw_sp_acl_init(struct mlxsw_sp *m acl->dummy_fid = fid;
INIT_LIST_HEAD(&acl->rules); + mutex_init(&acl->rules_lock); err = mlxsw_sp_acl_tcam_init(mlxsw_sp, &acl->tcam); if (err) goto err_acl_ops_init; @@ -892,6 +898,7 @@ int mlxsw_sp_acl_init(struct mlxsw_sp *m return 0;
err_acl_ops_init: + mutex_destroy(&acl->rules_lock); mlxsw_sp_fid_put(fid); err_fid_get: rhashtable_destroy(&acl->ruleset_ht); @@ -908,6 +915,7 @@ void mlxsw_sp_acl_fini(struct mlxsw_sp *
cancel_delayed_work_sync(&mlxsw_sp->acl->rule_activity_update.dw); mlxsw_sp_acl_tcam_fini(mlxsw_sp, &acl->tcam); + mutex_destroy(&acl->rules_lock); WARN_ON(!list_empty(&acl->rules)); mlxsw_sp_fid_put(acl->dummy_fid); rhashtable_destroy(&acl->ruleset_ht);
From: Kristian Evensen kristian.evensen@gmail.com
[ Upstream commit bb48eb9b12a95db9d679025927269d4adda6dbd1 ]
When submitting v2 of "fou: Support binding FoU socket" (1713cb37bf67), I accidentally sent the wrong version of the patch and one fix was missing. In the initial version of the patch, as well as the version 2 that I submitted, I incorrectly used ".type" for the two V6-attributes. The correct is to use ".len".
Reported-by: Dmitry Vyukov dvyukov@google.com Fixes: 1713cb37bf67 ("fou: Support binding FoU socket") Signed-off-by: Kristian Evensen kristian.evensen@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv4/fou.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/net/ipv4/fou.c +++ b/net/ipv4/fou.c @@ -662,8 +662,8 @@ static const struct nla_policy fou_nl_po [FOU_ATTR_REMCSUM_NOPARTIAL] = { .type = NLA_FLAG, }, [FOU_ATTR_LOCAL_V4] = { .type = NLA_U32, }, [FOU_ATTR_PEER_V4] = { .type = NLA_U32, }, - [FOU_ATTR_LOCAL_V6] = { .type = sizeof(struct in6_addr), }, - [FOU_ATTR_PEER_V6] = { .type = sizeof(struct in6_addr), }, + [FOU_ATTR_LOCAL_V6] = { .len = sizeof(struct in6_addr), }, + [FOU_ATTR_PEER_V6] = { .len = sizeof(struct in6_addr), }, [FOU_ATTR_PEER_PORT] = { .type = NLA_U16, }, [FOU_ATTR_IFINDEX] = { .type = NLA_S32, }, };
From: Maxim Mikityanskiy maximmi@mellanox.com
[ Upstream commit c80794323e82ac6ab45052ebba5757ce47b4b588 ]
Commit 323ebb61e32b ("net: use listified RX for handling GRO_NORMAL skbs") introduces batching of GRO_NORMAL packets in napi_frags_finish, and commit 6570bc79c0df ("net: core: use listified Rx for GRO_NORMAL in napi_gro_receive()") adds the same to napi_skb_finish. However, dev_gro_receive (that is called just before napi_{frags,skb}_finish) can also pass skbs to the networking stack: e.g., when the GRO session is flushed, napi_gro_complete is called, which passes pp directly to netif_receive_skb_internal, skipping napi->rx_list. It means that the packet stored in pp will be handled by the stack earlier than the packets that arrived before, but are still waiting in napi->rx_list. It leads to TCP reorderings that can be observed in the TCPOFOQueue counter in netstat.
This commit fixes the reordering issue by making napi_gro_complete also use napi->rx_list, so that all packets going through GRO will keep their order. In order to keep napi_gro_flush working properly, gro_normal_list calls are moved after the flush to clear napi->rx_list.
iwlwifi calls napi_gro_flush directly and does the same thing that is done by gro_normal_list, so the same change is applied there: napi_gro_flush is moved to be before the flush of napi->rx_list.
A few other drivers also use napi_gro_flush (brocade/bna/bnad.c, cortina/gemini.c, hisilicon/hns3/hns3_enet.c). The first two also use napi_complete_done afterwards, which performs the gro_normal_list flush, so they are fine. The latter calls napi_gro_receive right after napi_gro_flush, so it can end up with non-empty napi->rx_list anyway.
Fixes: 323ebb61e32b ("net: use listified RX for handling GRO_NORMAL skbs") Signed-off-by: Maxim Mikityanskiy maximmi@mellanox.com Cc: Alexander Lobakin alobakin@dlink.ru Cc: Edward Cree ecree@solarflare.com Acked-by: Alexander Lobakin alobakin@dlink.ru Acked-by: Saeed Mahameed saeedm@mellanox.com Acked-by: Edward Cree ecree@solarflare.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/wireless/intel/iwlwifi/pcie/rx.c | 4 - net/core/dev.c | 64 +++++++++++++-------------- 2 files changed, 35 insertions(+), 33 deletions(-)
--- a/drivers/net/wireless/intel/iwlwifi/pcie/rx.c +++ b/drivers/net/wireless/intel/iwlwifi/pcie/rx.c @@ -1537,13 +1537,13 @@ out:
napi = &rxq->napi; if (napi->poll) { + napi_gro_flush(napi, false); + if (napi->rx_count) { netif_receive_skb_list(&napi->rx_list); INIT_LIST_HEAD(&napi->rx_list); napi->rx_count = 0; } - - napi_gro_flush(napi, false); }
iwl_pcie_rxq_restock(trans, rxq); --- a/net/core/dev.c +++ b/net/core/dev.c @@ -5270,9 +5270,29 @@ static void flush_all_backlogs(void) put_online_cpus(); }
+/* Pass the currently batched GRO_NORMAL SKBs up to the stack. */ +static void gro_normal_list(struct napi_struct *napi) +{ + if (!napi->rx_count) + return; + netif_receive_skb_list_internal(&napi->rx_list); + INIT_LIST_HEAD(&napi->rx_list); + napi->rx_count = 0; +} + +/* Queue one GRO_NORMAL SKB up for list processing. If batch size exceeded, + * pass the whole batch up to the stack. + */ +static void gro_normal_one(struct napi_struct *napi, struct sk_buff *skb) +{ + list_add_tail(&skb->list, &napi->rx_list); + if (++napi->rx_count >= gro_normal_batch) + gro_normal_list(napi); +} + INDIRECT_CALLABLE_DECLARE(int inet_gro_complete(struct sk_buff *, int)); INDIRECT_CALLABLE_DECLARE(int ipv6_gro_complete(struct sk_buff *, int)); -static int napi_gro_complete(struct sk_buff *skb) +static int napi_gro_complete(struct napi_struct *napi, struct sk_buff *skb) { struct packet_offload *ptype; __be16 type = skb->protocol; @@ -5305,7 +5325,8 @@ static int napi_gro_complete(struct sk_b }
out: - return netif_receive_skb_internal(skb); + gro_normal_one(napi, skb); + return NET_RX_SUCCESS; }
static void __napi_gro_flush_chain(struct napi_struct *napi, u32 index, @@ -5318,7 +5339,7 @@ static void __napi_gro_flush_chain(struc if (flush_old && NAPI_GRO_CB(skb)->age == jiffies) return; skb_list_del_init(skb); - napi_gro_complete(skb); + napi_gro_complete(napi, skb); napi->gro_hash[index].count--; }
@@ -5421,7 +5442,7 @@ static void gro_pull_from_frag0(struct s } }
-static void gro_flush_oldest(struct list_head *head) +static void gro_flush_oldest(struct napi_struct *napi, struct list_head *head) { struct sk_buff *oldest;
@@ -5437,7 +5458,7 @@ static void gro_flush_oldest(struct list * SKB to the chain. */ skb_list_del_init(oldest); - napi_gro_complete(oldest); + napi_gro_complete(napi, oldest); }
INDIRECT_CALLABLE_DECLARE(struct sk_buff *inet_gro_receive(struct list_head *, @@ -5513,7 +5534,7 @@ static enum gro_result dev_gro_receive(s
if (pp) { skb_list_del_init(pp); - napi_gro_complete(pp); + napi_gro_complete(napi, pp); napi->gro_hash[hash].count--; }
@@ -5524,7 +5545,7 @@ static enum gro_result dev_gro_receive(s goto normal;
if (unlikely(napi->gro_hash[hash].count >= MAX_GRO_SKBS)) { - gro_flush_oldest(gro_head); + gro_flush_oldest(napi, gro_head); } else { napi->gro_hash[hash].count++; } @@ -5672,26 +5693,6 @@ struct sk_buff *napi_get_frags(struct na } EXPORT_SYMBOL(napi_get_frags);
-/* Pass the currently batched GRO_NORMAL SKBs up to the stack. */ -static void gro_normal_list(struct napi_struct *napi) -{ - if (!napi->rx_count) - return; - netif_receive_skb_list_internal(&napi->rx_list); - INIT_LIST_HEAD(&napi->rx_list); - napi->rx_count = 0; -} - -/* Queue one GRO_NORMAL SKB up for list processing. If batch size exceeded, - * pass the whole batch up to the stack. - */ -static void gro_normal_one(struct napi_struct *napi, struct sk_buff *skb) -{ - list_add_tail(&skb->list, &napi->rx_list); - if (++napi->rx_count >= gro_normal_batch) - gro_normal_list(napi); -} - static gro_result_t napi_frags_finish(struct napi_struct *napi, struct sk_buff *skb, gro_result_t ret) @@ -5979,8 +5980,6 @@ bool napi_complete_done(struct napi_stru NAPIF_STATE_IN_BUSY_POLL))) return false;
- gro_normal_list(n); - if (n->gro_bitmask) { unsigned long timeout = 0;
@@ -5996,6 +5995,9 @@ bool napi_complete_done(struct napi_stru hrtimer_start(&n->timer, ns_to_ktime(timeout), HRTIMER_MODE_REL_PINNED); } + + gro_normal_list(n); + if (unlikely(!list_empty(&n->poll_list))) { /* If n->poll_list is not empty, we need to mask irqs */ local_irq_save(flags); @@ -6327,8 +6329,6 @@ static int napi_poll(struct napi_struct goto out_unlock; }
- gro_normal_list(n); - if (n->gro_bitmask) { /* flush too old packets * If HZ < 1000, flush all packets. @@ -6336,6 +6336,8 @@ static int napi_poll(struct napi_struct napi_gro_flush(n, HZ >= 1000); }
+ gro_normal_list(n); + /* Some drivers may have called napi_schedule * prior to exhausting their budget. */
From: Paul Blakey paulb@mellanox.com
commit 93b8a7ecb7287cc9b0196f12a25b57c2462d11dc upstream.
The pool sizes represent the pool sizes in the fw. when we request a pool size from fw, it will return the next possible group. We track how many pools the fw has left and start requesting groups from the big to the small. When we start request 4k group, which doesn't exists in fw, fw wants to allocate the next possible size, 64k, but will fail since its exhausted. The correct smallest pool size in fw is 128 and not 4k.
Fixes: e52c28024008 ("net/mlx5: E-Switch, Add chains and priorities") Signed-off-by: Paul Blakey paulb@mellanox.com Reviewed-by: Roi Dayan roid@mellanox.com Signed-off-by: Saeed Mahameed saeedm@mellanox.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c @@ -858,7 +858,7 @@ out: */ #define ESW_SIZE (16 * 1024 * 1024) const unsigned int ESW_POOLS[4] = { 4 * 1024 * 1024, 1 * 1024 * 1024, - 64 * 1024, 4 * 1024 }; + 64 * 1024, 128 };
static int get_sz_from_pool(struct mlx5_eswitch *esw)
From: Meir Lichtinger meirl@mellanox.com
commit 505a7f5478062c6cd11e22022d9f1bf64cd8eab3 upstream
Add the upcoming ConnectX-7 device ID.
Fixes: 85327a9c4150 ("net/mlx5: Update the list of the PCI supported devices") Signed-off-by: Meir Lichtinger meirl@mellanox.com Reviewed-by: Eran Ben Elisha eranbe@mellanox.com Signed-off-by: Saeed Mahameed saeedm@mellanox.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/mellanox/mlx5/core/main.c | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c @@ -1569,6 +1569,7 @@ static const struct pci_device_id mlx5_c { PCI_VDEVICE(MELLANOX, 0x101d) }, /* ConnectX-6 Dx */ { PCI_VDEVICE(MELLANOX, 0x101e), MLX5_PCI_DEV_IS_VF}, /* ConnectX Family mlx5Gen Virtual Function */ { PCI_VDEVICE(MELLANOX, 0x101f) }, /* ConnectX-6 LX */ + { PCI_VDEVICE(MELLANOX, 0x1021) }, /* ConnectX-7 */ { PCI_VDEVICE(MELLANOX, 0xa2d2) }, /* BlueField integrated ConnectX-5 network controller */ { PCI_VDEVICE(MELLANOX, 0xa2d3), MLX5_PCI_DEV_IS_VF}, /* BlueField integrated ConnectX-5 network controller VF */ { PCI_VDEVICE(MELLANOX, 0xa2d6) }, /* BlueField-2 integrated ConnectX-6 Dx network controller */
From: Erez Shitrit erezsh@mellanox.com
commmit b850a82114df9b0ec1d191dc64eed1f20a772e0f upstream.
The current code handles only counters that attached to dest, we still have the cases where we have counter on non-dest, like over drop etc.
Fixes: 6a48faeeca10 ("net/mlx5: Add direct rule fs_cmd implementation") Signed-off-by: Hamdan Igbaria hamdani@mellanox.com Signed-off-by: Erez Shitrit erezsh@mellanox.com Reviewed-by: Alex Vesker valex@mellanox.com Signed-off-by: Saeed Mahameed saeedm@mellanox.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/mellanox/mlx5/core/steering/fs_dr.c | 42 ++++++++++----- 1 file changed, 29 insertions(+), 13 deletions(-)
--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/fs_dr.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/fs_dr.c @@ -352,26 +352,16 @@ static int mlx5_cmd_dr_create_fte(struct if (fte->action.action & MLX5_FLOW_CONTEXT_ACTION_FWD_DEST) { list_for_each_entry(dst, &fte->node.children, node.list) { enum mlx5_flow_destination_type type = dst->dest_attr.type; - u32 id;
if (num_actions == MLX5_FLOW_CONTEXT_ACTION_MAX) { err = -ENOSPC; goto free_actions; }
- switch (type) { - case MLX5_FLOW_DESTINATION_TYPE_COUNTER: - id = dst->dest_attr.counter_id; + if (type == MLX5_FLOW_DESTINATION_TYPE_COUNTER) + continue;
- tmp_action = - mlx5dr_action_create_flow_counter(id); - if (!tmp_action) { - err = -ENOMEM; - goto free_actions; - } - fs_dr_actions[fs_dr_num_actions++] = tmp_action; - actions[num_actions++] = tmp_action; - break; + switch (type) { case MLX5_FLOW_DESTINATION_TYPE_FLOW_TABLE: tmp_action = create_ft_action(dev, dst); if (!tmp_action) { @@ -397,6 +387,32 @@ static int mlx5_cmd_dr_create_fte(struct } }
+ if (fte->action.action & MLX5_FLOW_CONTEXT_ACTION_COUNT) { + list_for_each_entry(dst, &fte->node.children, node.list) { + u32 id; + + if (dst->dest_attr.type != + MLX5_FLOW_DESTINATION_TYPE_COUNTER) + continue; + + if (num_actions == MLX5_FLOW_CONTEXT_ACTION_MAX) { + err = -ENOSPC; + goto free_actions; + } + + id = dst->dest_attr.counter_id; + tmp_action = + mlx5dr_action_create_flow_counter(id); + if (!tmp_action) { + err = -ENOMEM; + goto free_actions; + } + + fs_dr_actions[fs_dr_num_actions++] = tmp_action; + actions[num_actions++] = tmp_action; + } + } + params.match_sz = match_sz; params.match_buf = (u64 *)fte->val;
From: Eli Cohen eli@mellanox.com
commit e401a1848be87123a2b2049addbf21138cb47081 upstream.
Since the implementation relies on limiting the VF transmit rate to simulate ingress rate limiting, and since either uplink representor or ecpf are not associated with a VF, we limit the rate limit configuration for those ports.
Fixes: fcb64c0f5640 ("net/mlx5: E-Switch, add ingress rate support") Signed-off-by: Eli Cohen eli@mellanox.com Reviewed-by: Roi Dayan roid@mellanox.com Signed-off-by: Saeed Mahameed saeedm@mellanox.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c @@ -3951,6 +3951,13 @@ static int apply_police_params(struct ml u32 rate_mbps; int err;
+ vport_num = rpriv->rep->vport; + if (vport_num >= MLX5_VPORT_ECPF) { + NL_SET_ERR_MSG_MOD(extack, + "Ingress rate limit is supported only for Eswitch ports connected to VFs"); + return -EOPNOTSUPP; + } + esw = priv->mdev->priv.eswitch; /* rate is given in bytes/sec. * First convert to bits/sec and then round to the nearest mbit/secs. @@ -3959,8 +3966,6 @@ static int apply_police_params(struct ml * 1 mbit/sec. */ rate_mbps = rate ? max_t(u32, (rate * 8 + 500000) / 1000000, 1) : 0; - vport_num = rpriv->rep->vport; - err = mlx5_esw_modify_vport_rate(esw, vport_num, rate_mbps); if (err) NL_SET_ERR_MSG_MOD(extack, "failed applying action to hardware");
From: Erez Shitrit erezsh@mellanox.com
commit c0702a4bd41829f05638ec2dab70f6bb8d8010ce upstream.
Use raw_smp_processor_id instead of smp_processor_id() otherwise we will get the following trace in debug-kernel: BUG: using smp_processor_id() in preemptible [00000000] code: devlink caller is dr_create_cq.constprop.2+0x31d/0x970 [mlx5_core] Call Trace: dump_stack+0x9a/0xf0 debug_smp_processor_id+0x1f3/0x200 dr_create_cq.constprop.2+0x31d/0x970 genl_family_rcv_msg+0x5fd/0x1170 genl_rcv_msg+0xb8/0x160 netlink_rcv_skb+0x11e/0x340
Fixes: 297cccebdc5a ("net/mlx5: DR, Expose an internal API to issue RDMA operations") Signed-off-by: Erez Shitrit erezsh@mellanox.com Signed-off-by: Saeed Mahameed saeedm@mellanox.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/mellanox/mlx5/core/steering/dr_send.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_send.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_send.c @@ -1,6 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB /* Copyright (c) 2019 Mellanox Technologies. */
+#include <linux/smp.h> #include "dr_types.h"
#define QUEUE_SIZE 128 @@ -729,7 +730,7 @@ static struct mlx5dr_cq *dr_create_cq(st if (!in) goto err_cqwq;
- vector = smp_processor_id() % mlx5_comp_vectors_count(mdev); + vector = raw_smp_processor_id() % mlx5_comp_vectors_count(mdev); err = mlx5_vector2eqn(mdev, vector, &eqn, &irqn); if (err) { kvfree(in);
From: Tariq Toukan tariqt@mellanox.com
commit ffbd9ca94e2ebbfe802d4b28bab5ba19818de853 upstream.
There are the following cases:
1. Packet ends before start marker: bypass offload. 2. Packet starts before start marker and ends after it: drop, not supported, breaks contract with kernel. 3. packet ends before tls record info starts: drop, this packet was already acknowledged and its record info was released.
Add the above as comment in code.
Mind possible wraparounds of the TCP seq, replace the simple comparison with a call to the TCP before() method.
In addition, remove logic that handles negative sync_len values, as it became impossible.
Fixes: d2ead1f360e8 ("net/mlx5e: Add kTLS TX HW offload support") Fixes: 46a3ea98074e ("net/mlx5e: kTLS, Enhance TX resync flow") Signed-off-by: Tariq Toukan tariqt@mellanox.com Signed-off-by: Boris Pismenny borisp@mellanox.com Reviewed-by: Boris Pismenny borisp@mellanox.com Signed-off-by: Saeed Mahameed saeedm@mellanox.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c | 33 +++++++------ 1 file changed, 19 insertions(+), 14 deletions(-)
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c @@ -180,7 +180,7 @@ mlx5e_ktls_tx_post_param_wqes(struct mlx
struct tx_sync_info { u64 rcd_sn; - s32 sync_len; + u32 sync_len; int nr_frags; skb_frag_t frags[MAX_SKB_FRAGS]; }; @@ -193,13 +193,14 @@ enum mlx5e_ktls_sync_retval {
static enum mlx5e_ktls_sync_retval tx_sync_info_get(struct mlx5e_ktls_offload_context_tx *priv_tx, - u32 tcp_seq, struct tx_sync_info *info) + u32 tcp_seq, int datalen, struct tx_sync_info *info) { struct tls_offload_context_tx *tx_ctx = priv_tx->tx_ctx; enum mlx5e_ktls_sync_retval ret = MLX5E_KTLS_SYNC_DONE; struct tls_record_info *record; int remaining, i = 0; unsigned long flags; + bool ends_before;
spin_lock_irqsave(&tx_ctx->lock, flags); record = tls_get_record(tx_ctx, tcp_seq, &info->rcd_sn); @@ -209,9 +210,21 @@ tx_sync_info_get(struct mlx5e_ktls_offlo goto out; }
- if (unlikely(tcp_seq < tls_record_start_seq(record))) { - ret = tls_record_is_start_marker(record) ? - MLX5E_KTLS_SYNC_SKIP_NO_DATA : MLX5E_KTLS_SYNC_FAIL; + /* There are the following cases: + * 1. packet ends before start marker: bypass offload. + * 2. packet starts before start marker and ends after it: drop, + * not supported, breaks contract with kernel. + * 3. packet ends before tls record info starts: drop, + * this packet was already acknowledged and its record info + * was released. + */ + ends_before = before(tcp_seq + datalen, tls_record_start_seq(record)); + + if (unlikely(tls_record_is_start_marker(record))) { + ret = ends_before ? MLX5E_KTLS_SYNC_SKIP_NO_DATA : MLX5E_KTLS_SYNC_FAIL; + goto out; + } else if (ends_before) { + ret = MLX5E_KTLS_SYNC_FAIL; goto out; }
@@ -337,7 +350,7 @@ mlx5e_ktls_tx_handle_ooo(struct mlx5e_kt u8 num_wqebbs; int i = 0;
- ret = tx_sync_info_get(priv_tx, seq, &info); + ret = tx_sync_info_get(priv_tx, seq, datalen, &info); if (unlikely(ret != MLX5E_KTLS_SYNC_DONE)) { if (ret == MLX5E_KTLS_SYNC_SKIP_NO_DATA) { stats->tls_skip_no_sync_data++; @@ -351,14 +364,6 @@ mlx5e_ktls_tx_handle_ooo(struct mlx5e_kt goto err_out; }
- if (unlikely(info.sync_len < 0)) { - if (likely(datalen <= -info.sync_len)) - return MLX5E_KTLS_SYNC_DONE; - - stats->tls_drop_bypass_req++; - goto err_out; - } - stats->tls_ooo++;
tx_post_resync_params(sq, priv_tx, info.rcd_sn);
From: Tariq Toukan tariqt@mellanox.com
commit 1e92899791358dba94a9db7cc3b6004636b5a2f6 upstream.
The call to tx_post_resync_params() is done earlier in the flow, the post of the control WQEs is unnecessarily repeated. Remove it.
Fixes: 700ec4974240 ("net/mlx5e: kTLS, Fix missing SQ edge fill") Signed-off-by: Tariq Toukan tariqt@mellanox.com Signed-off-by: Boris Pismenny borisp@mellanox.com Reviewed-by: Boris Pismenny borisp@mellanox.com Signed-off-by: Saeed Mahameed saeedm@mellanox.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c | 2 -- 1 file changed, 2 deletions(-)
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c @@ -383,8 +383,6 @@ mlx5e_ktls_tx_handle_ooo(struct mlx5e_kt if (unlikely(contig_wqebbs_room < num_wqebbs)) mlx5e_fill_sq_frag_edge(sq, wq, pi, contig_wqebbs_room);
- tx_post_resync_params(sq, priv_tx, info.rcd_sn); - for (; i < info.nr_frags; i++) { unsigned int orig_fsz, frag_offset = 0, n = 0; skb_frag_t *f = &info.frags[i];
From: Tariq Toukan tariqt@mellanox.com
commit 342508c1c7540e281fd36151c175ba5ff954a99f upstream.
When TCP out-of-order is identified (unexpected tcp seq mismatch), driver analyzes the packet and decides what handling should it get: 1. go to accelerated path (to be encrypted in HW), 2. go to regular xmit path (send w/o encryption), 3. drop.
Packets marked with skb->decrypted by the TLS stack in the TX flow skips SW encryption, and rely on the HW offload. Verify that such packets are never sent un-encrypted on the wire. Add a WARN to catch such bugs, and prefer dropping the packet in these cases.
Fixes: 46a3ea98074e ("net/mlx5e: kTLS, Enhance TX resync flow") Signed-off-by: Tariq Toukan tariqt@mellanox.com Signed-off-by: Boris Pismenny borisp@mellanox.com Reviewed-by: Boris Pismenny borisp@mellanox.com Signed-off-by: Saeed Mahameed saeedm@mellanox.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c | 14 +++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-)
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c @@ -458,12 +458,18 @@ struct sk_buff *mlx5e_ktls_handle_tx_skb enum mlx5e_ktls_sync_retval ret = mlx5e_ktls_tx_handle_ooo(priv_tx, sq, datalen, seq);
- if (likely(ret == MLX5E_KTLS_SYNC_DONE)) + switch (ret) { + case MLX5E_KTLS_SYNC_DONE: *wqe = mlx5e_sq_fetch_wqe(sq, sizeof(**wqe), pi); - else if (ret == MLX5E_KTLS_SYNC_FAIL) + break; + case MLX5E_KTLS_SYNC_SKIP_NO_DATA: + if (likely(!skb->decrypted)) + goto out; + WARN_ON_ONCE(1); + /* fall-through */ + default: /* MLX5E_KTLS_SYNC_FAIL */ goto err_out; - else /* ret == MLX5E_KTLS_SYNC_SKIP_NO_DATA */ - goto out; + } }
priv_tx->expected_seq = seq + datalen;
From: David Ahern dsahern@gmail.com
[ Upstream commit 9827c0634e461703abf81e8cc8b7adf5da5886d0 ]
Sven-Haegar reported looping on fib dumps when 255.255.255.255 route has been added to a table. The looping is caused by the key rolling over from FFFFFFFF to 0. When dumping a specific table only, we need a means to detect when the table dump is done. The key and count saved to cb args are both 0 only at the start of the table dump. If key is 0 and count > 0, then we are in the rollover case. Detect and return to avoid looping.
This only affects dumps of a specific table; for dumps of all tables (the case prior to the change in the Fixes tag) inet_dump_fib moved the entry counter to the next table and reset the cb args used by fib_table_dump and fn_trie_dump_leaf, so the rollover ffffffff back to 0 did not cause looping with the dumps.
Fixes: effe67926624 ("net: Enable kernel side filtering of route dumps") Reported-by: Sven-Haegar Koch haegar@sdinet.de Signed-off-by: David Ahern dsahern@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv4/fib_trie.c | 6 ++++++ 1 file changed, 6 insertions(+)
--- a/net/ipv4/fib_trie.c +++ b/net/ipv4/fib_trie.c @@ -2175,6 +2175,12 @@ int fib_table_dump(struct fib_table *tb, int count = cb->args[2]; t_key key = cb->args[3];
+ /* First time here, count and key are both always 0. Count > 0 + * and key == 0 means the dump has wrapped around and we are done. + */ + if (count && !key) + return skb->len; + while ((l = leaf_walk_rcu(&tp, key)) != NULL) { int err;
From: Jens Axboe axboe@kernel.dk
commit 73e08e711d9c1d79fae01daed4b0e1fee5f8a275 upstream.
This ends up being too restrictive for tasks that willingly fork and share the ring between forks. Andres reports that this breaks his postgresql work. Since we're close to 5.5 release, revert this change for now.
Cc: stable@vger.kernel.org Fixes: 44d282796f81 ("io_uring: only allow submit from owning task") Reported-by: Andres Freund andres@anarazel.de Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/io_uring.c | 6 ------ 1 file changed, 6 deletions(-)
--- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -3716,12 +3716,6 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned wake_up(&ctx->sqo_wait); submitted = to_submit; } else if (to_submit) { - if (current->mm != ctx->sqo_mm || - current_cred() != ctx->creds) { - ret = -EPERM; - goto out; - } - to_submit = min(to_submit, ctx->sq_entries);
mutex_lock(&ctx->uring_lock);
From: David Howells dhowells@redhat.com
commit a45ea48e2bcd92c1f678b794f488ca0bda9835b8 upstream.
The afs filesystem needs to prohibit certain characters from cell names, such as '/', as these are used to form filenames in procfs, leading to the following warning being generated:
WARNING: CPU: 0 PID: 3489 at fs/proc/generic.c:178
Fix afs_alloc_cell() to disallow nonprintable characters, '/', '@' and names that begin with a dot.
Remove the check for "@cell" as that is then redundant.
This can be tested by running:
echo add foo/.bar 1.2.3.4 >/proc/fs/afs/cells
Note that we will also need to deal with:
- Names ending in ".invalid" shouldn't be passed to the DNS.
- Names that contain non-valid domainname chars shouldn't be passed to the DNS.
- DNS replies that say "your-dns-needs-immediate-attention.<gTLD>" and replies containing A records that say 127.0.53.53 should be considered invalid. [https://www.icann.org/en/system/files/files/name-collision-mitigation-01aug1...]
but these need to be dealt with by the kafs-client DNS program rather than the kernel.
Reported-by: syzbot+b904ba7c947a37b4b291@syzkaller.appspotmail.com Cc: stable@kernel.org Signed-off-by: David Howells dhowells@redhat.com Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/afs/cell.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-)
--- a/fs/afs/cell.c +++ b/fs/afs/cell.c @@ -134,8 +134,17 @@ static struct afs_cell *afs_alloc_cell(s _leave(" = -ENAMETOOLONG"); return ERR_PTR(-ENAMETOOLONG); } - if (namelen == 5 && memcmp(name, "@cell", 5) == 0) + + /* Prohibit cell names that contain unprintable chars, '/' and '@' or + * that begin with a dot. This also precludes "@cell". + */ + if (name[0] == '.') return ERR_PTR(-EINVAL); + for (i = 0; i < namelen; i++) { + char ch = name[i]; + if (!isprint(ch) || ch == '/' || ch == '@') + return ERR_PTR(-EINVAL); + }
_enter("%*.*s,%s", namelen, namelen, name, addresses);
From: Luuk Paulussen luuk.paulussen@alliedtelesis.co.nz
commit cf3ca1877574a306c0207cbf7fdf25419d9229df upstream.
reg2volt returns the voltage that matches a given register value. Converting this back the other way with volt2reg didn't return the same register value because it used truncation instead of rounding.
This meant that values read from sysfs could not be written back to sysfs to set back the same register value.
With this change, volt2reg will return the same value for every voltage previously returned by reg2volt (for the set of possible input values)
Signed-off-by: Luuk Paulussen luuk.paulussen@alliedtelesis.co.nz Link: https://lore.kernel.org/r/20191205231659.1301-1-luuk.paulussen@alliedtelesis... cc: stable@vger.kernel.org Signed-off-by: Guenter Roeck linux@roeck-us.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/hwmon/adt7475.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
--- a/drivers/hwmon/adt7475.c +++ b/drivers/hwmon/adt7475.c @@ -294,9 +294,10 @@ static inline u16 volt2reg(int channel, long reg;
if (bypass_attn & (1 << channel)) - reg = (volt * 1024) / 2250; + reg = DIV_ROUND_CLOSEST(volt * 1024, 2250); else - reg = (volt * r[1] * 1024) / ((r[0] + r[1]) * 2250); + reg = DIV_ROUND_CLOSEST(volt * r[1] * 1024, + (r[0] + r[1]) * 2250); return clamp_val(reg, 0, 1023) & (0xff << 2); }
From: Guenter Roeck linux@roeck-us.net
commit 3bf8bdcf3bada771eb12b57f2a30caee69e8ab8d upstream.
The hwmon core uses device managed functions, tied to the hwmon parent device, for various internal memory allocations. This is problematic since hwmon device lifetime does not necessarily match its parent's device lifetime. If there is a mismatch, memory leaks will accumulate until the parent device is released.
Fix the problem by managing all memory allocations internally. The only exception is memory allocation for thermal device registration, which can be tied to the hwmon device, along with thermal device registration itself.
Fixes: d560168b5d0f ("hwmon: (core) New hwmon registration API") Cc: stable@vger.kernel.org # v4.14.x: 47c332deb8e8: hwmon: Deal with errors from the thermal subsystem Cc: stable@vger.kernel.org # v4.14.x: 74e3512731bd: hwmon: (core) Fix double-free in __hwmon_device_register() Cc: stable@vger.kernel.org # v4.9.x: 3a412d5e4a1c: hwmon: (core) Simplify sysfs attribute name allocation Cc: stable@vger.kernel.org # v4.9.x: 47c332deb8e8: hwmon: Deal with errors from the thermal subsystem Cc: stable@vger.kernel.org # v4.9.x: 74e3512731bd: hwmon: (core) Fix double-free in __hwmon_device_register() Cc: stable@vger.kernel.org # v4.9+ Cc: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Guenter Roeck linux@roeck-us.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/hwmon/hwmon.c | 68 ++++++++++++++++++++++++++++++-------------------- 1 file changed, 41 insertions(+), 27 deletions(-)
--- a/drivers/hwmon/hwmon.c +++ b/drivers/hwmon/hwmon.c @@ -51,6 +51,7 @@ struct hwmon_device_attribute {
#define to_hwmon_attr(d) \ container_of(d, struct hwmon_device_attribute, dev_attr) +#define to_dev_attr(a) container_of(a, struct device_attribute, attr)
/* * Thermal zone information @@ -58,7 +59,7 @@ struct hwmon_device_attribute { * also provides the sensor index. */ struct hwmon_thermal_data { - struct hwmon_device *hwdev; /* Reference to hwmon device */ + struct device *dev; /* Reference to hwmon device */ int index; /* sensor index */ };
@@ -95,9 +96,27 @@ static const struct attribute_group *hwm NULL };
+static void hwmon_free_attrs(struct attribute **attrs) +{ + int i; + + for (i = 0; attrs[i]; i++) { + struct device_attribute *dattr = to_dev_attr(attrs[i]); + struct hwmon_device_attribute *hattr = to_hwmon_attr(dattr); + + kfree(hattr); + } + kfree(attrs); +} + static void hwmon_dev_release(struct device *dev) { - kfree(to_hwmon_device(dev)); + struct hwmon_device *hwdev = to_hwmon_device(dev); + + if (hwdev->group.attrs) + hwmon_free_attrs(hwdev->group.attrs); + kfree(hwdev->groups); + kfree(hwdev); }
static struct class hwmon_class = { @@ -119,11 +138,11 @@ static DEFINE_IDA(hwmon_ida); static int hwmon_thermal_get_temp(void *data, int *temp) { struct hwmon_thermal_data *tdata = data; - struct hwmon_device *hwdev = tdata->hwdev; + struct hwmon_device *hwdev = to_hwmon_device(tdata->dev); int ret; long t;
- ret = hwdev->chip->ops->read(&hwdev->dev, hwmon_temp, hwmon_temp_input, + ret = hwdev->chip->ops->read(tdata->dev, hwmon_temp, hwmon_temp_input, tdata->index, &t); if (ret < 0) return ret; @@ -137,8 +156,7 @@ static const struct thermal_zone_of_devi .get_temp = hwmon_thermal_get_temp, };
-static int hwmon_thermal_add_sensor(struct device *dev, - struct hwmon_device *hwdev, int index) +static int hwmon_thermal_add_sensor(struct device *dev, int index) { struct hwmon_thermal_data *tdata; struct thermal_zone_device *tzd; @@ -147,10 +165,10 @@ static int hwmon_thermal_add_sensor(stru if (!tdata) return -ENOMEM;
- tdata->hwdev = hwdev; + tdata->dev = dev; tdata->index = index;
- tzd = devm_thermal_zone_of_sensor_register(&hwdev->dev, index, tdata, + tzd = devm_thermal_zone_of_sensor_register(dev, index, tdata, &hwmon_thermal_ops); /* * If CONFIG_THERMAL_OF is disabled, this returns -ENODEV, @@ -162,8 +180,7 @@ static int hwmon_thermal_add_sensor(stru return 0; } #else -static int hwmon_thermal_add_sensor(struct device *dev, - struct hwmon_device *hwdev, int index) +static int hwmon_thermal_add_sensor(struct device *dev, int index) { return 0; } @@ -250,8 +267,7 @@ static bool is_string_attr(enum hwmon_se (type == hwmon_fan && attr == hwmon_fan_label); }
-static struct attribute *hwmon_genattr(struct device *dev, - const void *drvdata, +static struct attribute *hwmon_genattr(const void *drvdata, enum hwmon_sensor_types type, u32 attr, int index, @@ -279,7 +295,7 @@ static struct attribute *hwmon_genattr(s if ((mode & 0222) && !ops->write) return ERR_PTR(-EINVAL);
- hattr = devm_kzalloc(dev, sizeof(*hattr), GFP_KERNEL); + hattr = kzalloc(sizeof(*hattr), GFP_KERNEL); if (!hattr) return ERR_PTR(-ENOMEM);
@@ -492,8 +508,7 @@ static int hwmon_num_channel_attrs(const return n; }
-static int hwmon_genattrs(struct device *dev, - const void *drvdata, +static int hwmon_genattrs(const void *drvdata, struct attribute **attrs, const struct hwmon_ops *ops, const struct hwmon_channel_info *info) @@ -519,7 +534,7 @@ static int hwmon_genattrs(struct device attr_mask &= ~BIT(attr); if (attr >= template_size) return -EINVAL; - a = hwmon_genattr(dev, drvdata, info->type, attr, i, + a = hwmon_genattr(drvdata, info->type, attr, i, templates[attr], ops); if (IS_ERR(a)) { if (PTR_ERR(a) != -ENOENT) @@ -533,8 +548,7 @@ static int hwmon_genattrs(struct device }
static struct attribute ** -__hwmon_create_attrs(struct device *dev, const void *drvdata, - const struct hwmon_chip_info *chip) +__hwmon_create_attrs(const void *drvdata, const struct hwmon_chip_info *chip) { int ret, i, aindex = 0, nattrs = 0; struct attribute **attrs; @@ -545,15 +559,17 @@ __hwmon_create_attrs(struct device *dev, if (nattrs == 0) return ERR_PTR(-EINVAL);
- attrs = devm_kcalloc(dev, nattrs + 1, sizeof(*attrs), GFP_KERNEL); + attrs = kcalloc(nattrs + 1, sizeof(*attrs), GFP_KERNEL); if (!attrs) return ERR_PTR(-ENOMEM);
for (i = 0; chip->info[i]; i++) { - ret = hwmon_genattrs(dev, drvdata, &attrs[aindex], chip->ops, + ret = hwmon_genattrs(drvdata, &attrs[aindex], chip->ops, chip->info[i]); - if (ret < 0) + if (ret < 0) { + hwmon_free_attrs(attrs); return ERR_PTR(ret); + } aindex += ret; }
@@ -595,14 +611,13 @@ __hwmon_device_register(struct device *d for (i = 0; groups[i]; i++) ngroups++;
- hwdev->groups = devm_kcalloc(dev, ngroups, sizeof(*groups), - GFP_KERNEL); + hwdev->groups = kcalloc(ngroups, sizeof(*groups), GFP_KERNEL); if (!hwdev->groups) { err = -ENOMEM; goto free_hwmon; }
- attrs = __hwmon_create_attrs(dev, drvdata, chip); + attrs = __hwmon_create_attrs(drvdata, chip); if (IS_ERR(attrs)) { err = PTR_ERR(attrs); goto free_hwmon; @@ -647,8 +662,7 @@ __hwmon_device_register(struct device *d hwmon_temp_input, j)) continue; if (info[i]->config[j] & HWMON_T_INPUT) { - err = hwmon_thermal_add_sensor(dev, - hwdev, j); + err = hwmon_thermal_add_sensor(hdev, j); if (err) { device_unregister(hdev); /* @@ -667,7 +681,7 @@ __hwmon_device_register(struct device *d return hdev;
free_hwmon: - kfree(hwdev); + hwmon_dev_release(hdev); ida_remove: ida_simple_remove(&hwmon_ida, id); return ERR_PTR(err);
From: Jeff Layton jlayton@kernel.org
commit 9c1c2b35f1d94de8325344c2777d7ee67492db3b upstream.
Currently, we just assume that it will stick around by virtue of the submitter's reference, but later patches will allow the syscall to return early and we can't rely on that reference at that point.
While I'm not aware of any reports of it, Xiubo pointed out that this may fix a use-after-free. If the wait for a reply times out or is canceled via signal, and then the reply comes in after the syscall returns, the client can end up trying to access r_parent without a reference.
Take an extra reference to the inode when setting r_parent and release it when releasing the request.
Cc: stable@vger.kernel.org Signed-off-by: Jeff Layton jlayton@kernel.org Reviewed-by: "Yan, Zheng" zyan@redhat.com Signed-off-by: Ilya Dryomov idryomov@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/ceph/mds_client.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
--- a/fs/ceph/mds_client.c +++ b/fs/ceph/mds_client.c @@ -708,8 +708,10 @@ void ceph_mdsc_release_request(struct kr /* avoid calling iput_final() in mds dispatch threads */ ceph_async_iput(req->r_inode); } - if (req->r_parent) + if (req->r_parent) { ceph_put_cap_refs(ceph_inode(req->r_parent), CEPH_CAP_PIN); + ceph_async_iput(req->r_parent); + } ceph_async_iput(req->r_target_inode); if (req->r_dentry) dput(req->r_dentry); @@ -2670,8 +2672,10 @@ int ceph_mdsc_submit_request(struct ceph /* take CAP_PIN refs for r_inode, r_parent, r_old_dentry */ if (req->r_inode) ceph_get_cap_refs(ceph_inode(req->r_inode), CEPH_CAP_PIN); - if (req->r_parent) + if (req->r_parent) { ceph_get_cap_refs(ceph_inode(req->r_parent), CEPH_CAP_PIN); + ihold(req->r_parent); + } if (req->r_old_dentry_dir) ceph_get_cap_refs(ceph_inode(req->r_old_dentry_dir), CEPH_CAP_PIN);
From: Alex Deucher alexander.deucher@amd.com
commit 5e89cd303e3a4505752952259b9f1ba036632544 upstream.
To account for parts of the chip that are "harvested" (disabled) due to silicon flaws, caches on some AMD GPUs must be initialized before ATS is enabled.
ATS is normally enabled by the IOMMU driver before the GPU driver loads, so this cache initialization would have to be done in a quirk, but that's too complex to be practical.
For Navi14 (device ID 0x7340), this initialization is done by the VBIOS, but apparently some boards went to production with an older VBIOS that doesn't do it. Disable ATS for those boards.
Link: https://lore.kernel.org/r/20200114205523.1054271-3-alexander.deucher@amd.com Bug: https://gitlab.freedesktop.org/drm/amd/issues/1015 See-also: d28ca864c493 ("PCI: Mark AMD Stoney Radeon R7 GPU ATS as broken") See-also: 9b44b0b09dec ("PCI: Mark AMD Stoney GPU ATS as broken") [bhelgaas: squash into one patch, simplify slightly, commit log] Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Bjorn Helgaas bhelgaas@google.com Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/pci/quirks.c | 19 +++++++++++++------ 1 file changed, 13 insertions(+), 6 deletions(-)
--- a/drivers/pci/quirks.c +++ b/drivers/pci/quirks.c @@ -5021,18 +5021,25 @@ DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_SE
#ifdef CONFIG_PCI_ATS /* - * Some devices have a broken ATS implementation causing IOMMU stalls. - * Don't use ATS for those devices. + * Some devices require additional driver setup to enable ATS. Don't use + * ATS for those devices as ATS will be enabled before the driver has had a + * chance to load and configure the device. */ -static void quirk_no_ats(struct pci_dev *pdev) +static void quirk_amd_harvest_no_ats(struct pci_dev *pdev) { - pci_info(pdev, "disabling ATS (broken on this device)\n"); + if (pdev->device == 0x7340 && pdev->revision != 0xc5) + return; + + pci_info(pdev, "disabling ATS\n"); pdev->ats_cap = 0; }
/* AMD Stoney platform GPU */ -DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x98e4, quirk_no_ats); -DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x6900, quirk_no_ats); +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x98e4, quirk_amd_harvest_no_ats); +/* AMD Iceland dGPU */ +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x6900, quirk_amd_harvest_no_ats); +/* AMD Navi14 dGPU */ +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x7340, quirk_amd_harvest_no_ats); #endif /* CONFIG_PCI_ATS */
/* Freescale PCIe doesn't support MSI in RC mode */
From: Boris Brezillon boris.brezillon@collabora.com
commit bdefca2d8dc0f80bbe49e08bf52a717146490706 upstream.
With the introduction of per-FD address space, the same BO can be mapped in different address space if the BO is globally visible (GEM_FLINK) and opened in different context or if the dmabuf is self-imported. The current implementation does not take case into account, and attaches the mapping directly to the panfrost_gem_object.
Let's create a panfrost_gem_mapping struct and allow multiple mappings per BO.
The mappings are refcounted which helps solve another problem where mappings were torn down (GEM handle closed by userspace) while GPU jobs accessing those BOs were still in-flight. Jobs now keep a reference on the mappings they use.
v2 (robh): - Minor review comment clean-ups from Steven - Use list_is_singular helper - Just WARN if we add a mapping when madvise state is not WILLNEED. With that, drop the use of object_name_lock.
v3 (robh): - Revert returning list iterator in panfrost_gem_mapping_get()
Fixes: a5efb4c9a562 ("drm/panfrost: Restructure the GEM object creation") Fixes: 7282f7645d06 ("drm/panfrost: Implement per FD address spaces") Cc: stable@vger.kernel.org Signed-off-by: Boris Brezillon boris.brezillon@collabora.com Signed-off-by: Rob Herring robh@kernel.org Acked-by: Boris Brezillon boris.brezillon@collabora.com Reviewed-by: Steven Price steven.price@arm.com Link: https://patchwork.freedesktop.org/patch/msgid/20200116021554.15090-1-robh@ke... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/gpu/drm/panfrost/panfrost_drv.c | 91 +++++++++++++++- drivers/gpu/drm/panfrost/panfrost_gem.c | 124 ++++++++++++++++++++--- drivers/gpu/drm/panfrost/panfrost_gem.h | 41 ++++++- drivers/gpu/drm/panfrost/panfrost_gem_shrinker.c | 3 drivers/gpu/drm/panfrost/panfrost_job.c | 13 ++ drivers/gpu/drm/panfrost/panfrost_job.h | 1 drivers/gpu/drm/panfrost/panfrost_mmu.c | 61 ++++++----- drivers/gpu/drm/panfrost/panfrost_mmu.h | 6 - drivers/gpu/drm/panfrost/panfrost_perfcnt.c | 34 ++++-- 9 files changed, 300 insertions(+), 74 deletions(-)
--- a/drivers/gpu/drm/panfrost/panfrost_drv.c +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c @@ -78,8 +78,10 @@ static int panfrost_ioctl_get_param(stru static int panfrost_ioctl_create_bo(struct drm_device *dev, void *data, struct drm_file *file) { + struct panfrost_file_priv *priv = file->driver_priv; struct panfrost_gem_object *bo; struct drm_panfrost_create_bo *args = data; + struct panfrost_gem_mapping *mapping;
if (!args->size || args->pad || (args->flags & ~(PANFROST_BO_NOEXEC | PANFROST_BO_HEAP))) @@ -95,7 +97,14 @@ static int panfrost_ioctl_create_bo(stru if (IS_ERR(bo)) return PTR_ERR(bo);
- args->offset = bo->node.start << PAGE_SHIFT; + mapping = panfrost_gem_mapping_get(bo, priv); + if (!mapping) { + drm_gem_object_put_unlocked(&bo->base.base); + return -EINVAL; + } + + args->offset = mapping->mmnode.start << PAGE_SHIFT; + panfrost_gem_mapping_put(mapping);
return 0; } @@ -119,6 +128,11 @@ panfrost_lookup_bos(struct drm_device *d struct drm_panfrost_submit *args, struct panfrost_job *job) { + struct panfrost_file_priv *priv = file_priv->driver_priv; + struct panfrost_gem_object *bo; + unsigned int i; + int ret; + job->bo_count = args->bo_handle_count;
if (!job->bo_count) @@ -130,9 +144,32 @@ panfrost_lookup_bos(struct drm_device *d if (!job->implicit_fences) return -ENOMEM;
- return drm_gem_objects_lookup(file_priv, - (void __user *)(uintptr_t)args->bo_handles, - job->bo_count, &job->bos); + ret = drm_gem_objects_lookup(file_priv, + (void __user *)(uintptr_t)args->bo_handles, + job->bo_count, &job->bos); + if (ret) + return ret; + + job->mappings = kvmalloc_array(job->bo_count, + sizeof(struct panfrost_gem_mapping *), + GFP_KERNEL | __GFP_ZERO); + if (!job->mappings) + return -ENOMEM; + + for (i = 0; i < job->bo_count; i++) { + struct panfrost_gem_mapping *mapping; + + bo = to_panfrost_bo(job->bos[i]); + mapping = panfrost_gem_mapping_get(bo, priv); + if (!mapping) { + ret = -EINVAL; + break; + } + + job->mappings[i] = mapping; + } + + return ret; }
/** @@ -320,7 +357,9 @@ out: static int panfrost_ioctl_get_bo_offset(struct drm_device *dev, void *data, struct drm_file *file_priv) { + struct panfrost_file_priv *priv = file_priv->driver_priv; struct drm_panfrost_get_bo_offset *args = data; + struct panfrost_gem_mapping *mapping; struct drm_gem_object *gem_obj; struct panfrost_gem_object *bo;
@@ -331,18 +370,26 @@ static int panfrost_ioctl_get_bo_offset( } bo = to_panfrost_bo(gem_obj);
- args->offset = bo->node.start << PAGE_SHIFT; - + mapping = panfrost_gem_mapping_get(bo, priv); drm_gem_object_put_unlocked(gem_obj); + + if (!mapping) + return -EINVAL; + + args->offset = mapping->mmnode.start << PAGE_SHIFT; + panfrost_gem_mapping_put(mapping); return 0; }
static int panfrost_ioctl_madvise(struct drm_device *dev, void *data, struct drm_file *file_priv) { + struct panfrost_file_priv *priv = file_priv->driver_priv; struct drm_panfrost_madvise *args = data; struct panfrost_device *pfdev = dev->dev_private; struct drm_gem_object *gem_obj; + struct panfrost_gem_object *bo; + int ret = 0;
gem_obj = drm_gem_object_lookup(file_priv, args->handle); if (!gem_obj) { @@ -350,22 +397,48 @@ static int panfrost_ioctl_madvise(struct return -ENOENT; }
+ bo = to_panfrost_bo(gem_obj); + mutex_lock(&pfdev->shrinker_lock); + mutex_lock(&bo->mappings.lock); + if (args->madv == PANFROST_MADV_DONTNEED) { + struct panfrost_gem_mapping *first; + + first = list_first_entry(&bo->mappings.list, + struct panfrost_gem_mapping, + node); + + /* + * If we want to mark the BO purgeable, there must be only one + * user: the caller FD. + * We could do something smarter and mark the BO purgeable only + * when all its users have marked it purgeable, but globally + * visible/shared BOs are likely to never be marked purgeable + * anyway, so let's not bother. + */ + if (!list_is_singular(&bo->mappings.list) || + WARN_ON_ONCE(first->mmu != &priv->mmu)) { + ret = -EINVAL; + goto out_unlock_mappings; + } + } + args->retained = drm_gem_shmem_madvise(gem_obj, args->madv);
if (args->retained) { - struct panfrost_gem_object *bo = to_panfrost_bo(gem_obj); - if (args->madv == PANFROST_MADV_DONTNEED) list_add_tail(&bo->base.madv_list, &pfdev->shrinker_list); else if (args->madv == PANFROST_MADV_WILLNEED) list_del_init(&bo->base.madv_list); } + +out_unlock_mappings: + mutex_unlock(&bo->mappings.lock); mutex_unlock(&pfdev->shrinker_lock);
drm_gem_object_put_unlocked(gem_obj); - return 0; + return ret; }
int panfrost_unstable_ioctl_check(void) --- a/drivers/gpu/drm/panfrost/panfrost_gem.c +++ b/drivers/gpu/drm/panfrost/panfrost_gem.c @@ -29,6 +29,12 @@ static void panfrost_gem_free_object(str list_del_init(&bo->base.madv_list); mutex_unlock(&pfdev->shrinker_lock);
+ /* + * If we still have mappings attached to the BO, there's a problem in + * our refcounting. + */ + WARN_ON_ONCE(!list_empty(&bo->mappings.list)); + if (bo->sgts) { int i; int n_sgt = bo->base.base.size / SZ_2M; @@ -46,6 +52,69 @@ static void panfrost_gem_free_object(str drm_gem_shmem_free_object(obj); }
+struct panfrost_gem_mapping * +panfrost_gem_mapping_get(struct panfrost_gem_object *bo, + struct panfrost_file_priv *priv) +{ + struct panfrost_gem_mapping *iter, *mapping = NULL; + + mutex_lock(&bo->mappings.lock); + list_for_each_entry(iter, &bo->mappings.list, node) { + if (iter->mmu == &priv->mmu) { + kref_get(&iter->refcount); + mapping = iter; + break; + } + } + mutex_unlock(&bo->mappings.lock); + + return mapping; +} + +static void +panfrost_gem_teardown_mapping(struct panfrost_gem_mapping *mapping) +{ + struct panfrost_file_priv *priv; + + if (mapping->active) + panfrost_mmu_unmap(mapping); + + priv = container_of(mapping->mmu, struct panfrost_file_priv, mmu); + spin_lock(&priv->mm_lock); + if (drm_mm_node_allocated(&mapping->mmnode)) + drm_mm_remove_node(&mapping->mmnode); + spin_unlock(&priv->mm_lock); +} + +static void panfrost_gem_mapping_release(struct kref *kref) +{ + struct panfrost_gem_mapping *mapping; + + mapping = container_of(kref, struct panfrost_gem_mapping, refcount); + + panfrost_gem_teardown_mapping(mapping); + drm_gem_object_put_unlocked(&mapping->obj->base.base); + kfree(mapping); +} + +void panfrost_gem_mapping_put(struct panfrost_gem_mapping *mapping) +{ + if (!mapping) + return; + + kref_put(&mapping->refcount, panfrost_gem_mapping_release); +} + +void panfrost_gem_teardown_mappings(struct panfrost_gem_object *bo) +{ + struct panfrost_gem_mapping *mapping; + + mutex_lock(&bo->mappings.lock); + list_for_each_entry(mapping, &bo->mappings.list, node) + panfrost_gem_teardown_mapping(mapping); + mutex_unlock(&bo->mappings.lock); +} + int panfrost_gem_open(struct drm_gem_object *obj, struct drm_file *file_priv) { int ret; @@ -54,6 +123,16 @@ int panfrost_gem_open(struct drm_gem_obj struct panfrost_gem_object *bo = to_panfrost_bo(obj); unsigned long color = bo->noexec ? PANFROST_BO_NOEXEC : 0; struct panfrost_file_priv *priv = file_priv->driver_priv; + struct panfrost_gem_mapping *mapping; + + mapping = kzalloc(sizeof(*mapping), GFP_KERNEL); + if (!mapping) + return -ENOMEM; + + INIT_LIST_HEAD(&mapping->node); + kref_init(&mapping->refcount); + drm_gem_object_get(obj); + mapping->obj = bo;
/* * Executable buffers cannot cross a 16MB boundary as the program @@ -66,37 +145,48 @@ int panfrost_gem_open(struct drm_gem_obj else align = size >= SZ_2M ? SZ_2M >> PAGE_SHIFT : 0;
- bo->mmu = &priv->mmu; + mapping->mmu = &priv->mmu; spin_lock(&priv->mm_lock); - ret = drm_mm_insert_node_generic(&priv->mm, &bo->node, + ret = drm_mm_insert_node_generic(&priv->mm, &mapping->mmnode, size >> PAGE_SHIFT, align, color, 0); spin_unlock(&priv->mm_lock); if (ret) - return ret; + goto err;
if (!bo->is_heap) { - ret = panfrost_mmu_map(bo); - if (ret) { - spin_lock(&priv->mm_lock); - drm_mm_remove_node(&bo->node); - spin_unlock(&priv->mm_lock); - } + ret = panfrost_mmu_map(mapping); + if (ret) + goto err; } + + mutex_lock(&bo->mappings.lock); + WARN_ON(bo->base.madv != PANFROST_MADV_WILLNEED); + list_add_tail(&mapping->node, &bo->mappings.list); + mutex_unlock(&bo->mappings.lock); + +err: + if (ret) + panfrost_gem_mapping_put(mapping); return ret; }
void panfrost_gem_close(struct drm_gem_object *obj, struct drm_file *file_priv) { - struct panfrost_gem_object *bo = to_panfrost_bo(obj); struct panfrost_file_priv *priv = file_priv->driver_priv; + struct panfrost_gem_object *bo = to_panfrost_bo(obj); + struct panfrost_gem_mapping *mapping = NULL, *iter;
- if (bo->is_mapped) - panfrost_mmu_unmap(bo); + mutex_lock(&bo->mappings.lock); + list_for_each_entry(iter, &bo->mappings.list, node) { + if (iter->mmu == &priv->mmu) { + mapping = iter; + list_del(&iter->node); + break; + } + } + mutex_unlock(&bo->mappings.lock);
- spin_lock(&priv->mm_lock); - if (drm_mm_node_allocated(&bo->node)) - drm_mm_remove_node(&bo->node); - spin_unlock(&priv->mm_lock); + panfrost_gem_mapping_put(mapping); }
static int panfrost_gem_pin(struct drm_gem_object *obj) @@ -136,6 +226,8 @@ struct drm_gem_object *panfrost_gem_crea if (!obj) return NULL;
+ INIT_LIST_HEAD(&obj->mappings.list); + mutex_init(&obj->mappings.lock); obj->base.base.funcs = &panfrost_gem_funcs;
return &obj->base.base; --- a/drivers/gpu/drm/panfrost/panfrost_gem.h +++ b/drivers/gpu/drm/panfrost/panfrost_gem.h @@ -13,23 +13,46 @@ struct panfrost_gem_object { struct drm_gem_shmem_object base; struct sg_table *sgts;
- struct panfrost_mmu *mmu; - struct drm_mm_node node; - bool is_mapped :1; + /* + * Use a list for now. If searching a mapping ever becomes the + * bottleneck, we should consider using an RB-tree, or even better, + * let the core store drm_gem_object_mapping entries (where we + * could place driver specific data) instead of drm_gem_object ones + * in its drm_file->object_idr table. + * + * struct drm_gem_object_mapping { + * struct drm_gem_object *obj; + * void *driver_priv; + * }; + */ + struct { + struct list_head list; + struct mutex lock; + } mappings; + bool noexec :1; bool is_heap :1; };
+struct panfrost_gem_mapping { + struct list_head node; + struct kref refcount; + struct panfrost_gem_object *obj; + struct drm_mm_node mmnode; + struct panfrost_mmu *mmu; + bool active :1; +}; + static inline struct panfrost_gem_object *to_panfrost_bo(struct drm_gem_object *obj) { return container_of(to_drm_gem_shmem_obj(obj), struct panfrost_gem_object, base); }
-static inline -struct panfrost_gem_object *drm_mm_node_to_panfrost_bo(struct drm_mm_node *node) +static inline struct panfrost_gem_mapping * +drm_mm_node_to_panfrost_mapping(struct drm_mm_node *node) { - return container_of(node, struct panfrost_gem_object, node); + return container_of(node, struct panfrost_gem_mapping, mmnode); }
struct drm_gem_object *panfrost_gem_create_object(struct drm_device *dev, size_t size); @@ -49,6 +72,12 @@ int panfrost_gem_open(struct drm_gem_obj void panfrost_gem_close(struct drm_gem_object *obj, struct drm_file *file_priv);
+struct panfrost_gem_mapping * +panfrost_gem_mapping_get(struct panfrost_gem_object *bo, + struct panfrost_file_priv *priv); +void panfrost_gem_mapping_put(struct panfrost_gem_mapping *mapping); +void panfrost_gem_teardown_mappings(struct panfrost_gem_object *bo); + void panfrost_gem_shrinker_init(struct drm_device *dev); void panfrost_gem_shrinker_cleanup(struct drm_device *dev);
--- a/drivers/gpu/drm/panfrost/panfrost_gem_shrinker.c +++ b/drivers/gpu/drm/panfrost/panfrost_gem_shrinker.c @@ -39,11 +39,12 @@ panfrost_gem_shrinker_count(struct shrin static bool panfrost_gem_purge(struct drm_gem_object *obj) { struct drm_gem_shmem_object *shmem = to_drm_gem_shmem_obj(obj); + struct panfrost_gem_object *bo = to_panfrost_bo(obj);
if (!mutex_trylock(&shmem->pages_lock)) return false;
- panfrost_mmu_unmap(to_panfrost_bo(obj)); + panfrost_gem_teardown_mappings(bo); drm_gem_shmem_purge_locked(obj);
mutex_unlock(&shmem->pages_lock); --- a/drivers/gpu/drm/panfrost/panfrost_job.c +++ b/drivers/gpu/drm/panfrost/panfrost_job.c @@ -269,9 +269,20 @@ static void panfrost_job_cleanup(struct dma_fence_put(job->done_fence); dma_fence_put(job->render_done_fence);
- if (job->bos) { + if (job->mappings) { for (i = 0; i < job->bo_count; i++) + panfrost_gem_mapping_put(job->mappings[i]); + kvfree(job->mappings); + } + + if (job->bos) { + struct panfrost_gem_object *bo; + + for (i = 0; i < job->bo_count; i++) { + bo = to_panfrost_bo(job->bos[i]); drm_gem_object_put_unlocked(job->bos[i]); + } + kvfree(job->bos); }
--- a/drivers/gpu/drm/panfrost/panfrost_job.h +++ b/drivers/gpu/drm/panfrost/panfrost_job.h @@ -32,6 +32,7 @@ struct panfrost_job {
/* Exclusive fences we have taken from the BOs to wait for */ struct dma_fence **implicit_fences; + struct panfrost_gem_mapping **mappings; struct drm_gem_object **bos; u32 bo_count;
--- a/drivers/gpu/drm/panfrost/panfrost_mmu.c +++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c @@ -269,14 +269,15 @@ static int mmu_map_sg(struct panfrost_de return 0; }
-int panfrost_mmu_map(struct panfrost_gem_object *bo) +int panfrost_mmu_map(struct panfrost_gem_mapping *mapping) { + struct panfrost_gem_object *bo = mapping->obj; struct drm_gem_object *obj = &bo->base.base; struct panfrost_device *pfdev = to_panfrost_device(obj->dev); struct sg_table *sgt; int prot = IOMMU_READ | IOMMU_WRITE;
- if (WARN_ON(bo->is_mapped)) + if (WARN_ON(mapping->active)) return 0;
if (bo->noexec) @@ -286,25 +287,28 @@ int panfrost_mmu_map(struct panfrost_gem if (WARN_ON(IS_ERR(sgt))) return PTR_ERR(sgt);
- mmu_map_sg(pfdev, bo->mmu, bo->node.start << PAGE_SHIFT, prot, sgt); - bo->is_mapped = true; + mmu_map_sg(pfdev, mapping->mmu, mapping->mmnode.start << PAGE_SHIFT, + prot, sgt); + mapping->active = true;
return 0; }
-void panfrost_mmu_unmap(struct panfrost_gem_object *bo) +void panfrost_mmu_unmap(struct panfrost_gem_mapping *mapping) { + struct panfrost_gem_object *bo = mapping->obj; struct drm_gem_object *obj = &bo->base.base; struct panfrost_device *pfdev = to_panfrost_device(obj->dev); - struct io_pgtable_ops *ops = bo->mmu->pgtbl_ops; - u64 iova = bo->node.start << PAGE_SHIFT; - size_t len = bo->node.size << PAGE_SHIFT; + struct io_pgtable_ops *ops = mapping->mmu->pgtbl_ops; + u64 iova = mapping->mmnode.start << PAGE_SHIFT; + size_t len = mapping->mmnode.size << PAGE_SHIFT; size_t unmapped_len = 0;
- if (WARN_ON(!bo->is_mapped)) + if (WARN_ON(!mapping->active)) return;
- dev_dbg(pfdev->dev, "unmap: as=%d, iova=%llx, len=%zx", bo->mmu->as, iova, len); + dev_dbg(pfdev->dev, "unmap: as=%d, iova=%llx, len=%zx", + mapping->mmu->as, iova, len);
while (unmapped_len < len) { size_t unmapped_page; @@ -318,8 +322,9 @@ void panfrost_mmu_unmap(struct panfrost_ unmapped_len += pgsize; }
- panfrost_mmu_flush_range(pfdev, bo->mmu, bo->node.start << PAGE_SHIFT, len); - bo->is_mapped = false; + panfrost_mmu_flush_range(pfdev, mapping->mmu, + mapping->mmnode.start << PAGE_SHIFT, len); + mapping->active = false; }
static void mmu_tlb_inv_context_s1(void *cookie) @@ -394,10 +399,10 @@ void panfrost_mmu_pgtable_free(struct pa free_io_pgtable_ops(mmu->pgtbl_ops); }
-static struct panfrost_gem_object * -addr_to_drm_mm_node(struct panfrost_device *pfdev, int as, u64 addr) +static struct panfrost_gem_mapping * +addr_to_mapping(struct panfrost_device *pfdev, int as, u64 addr) { - struct panfrost_gem_object *bo = NULL; + struct panfrost_gem_mapping *mapping = NULL; struct panfrost_file_priv *priv; struct drm_mm_node *node; u64 offset = addr >> PAGE_SHIFT; @@ -418,8 +423,9 @@ found_mmu: drm_mm_for_each_node(node, &priv->mm) { if (offset >= node->start && offset < (node->start + node->size)) { - bo = drm_mm_node_to_panfrost_bo(node); - drm_gem_object_get(&bo->base.base); + mapping = drm_mm_node_to_panfrost_mapping(node); + + kref_get(&mapping->refcount); break; } } @@ -427,7 +433,7 @@ found_mmu: spin_unlock(&priv->mm_lock); out: spin_unlock(&pfdev->as_lock); - return bo; + return mapping; }
#define NUM_FAULT_PAGES (SZ_2M / PAGE_SIZE) @@ -436,28 +442,30 @@ static int panfrost_mmu_map_fault_addr(s u64 addr) { int ret, i; + struct panfrost_gem_mapping *bomapping; struct panfrost_gem_object *bo; struct address_space *mapping; pgoff_t page_offset; struct sg_table *sgt; struct page **pages;
- bo = addr_to_drm_mm_node(pfdev, as, addr); - if (!bo) + bomapping = addr_to_mapping(pfdev, as, addr); + if (!bomapping) return -ENOENT;
+ bo = bomapping->obj; if (!bo->is_heap) { dev_WARN(pfdev->dev, "matching BO is not heap type (GPU VA = %llx)", - bo->node.start << PAGE_SHIFT); + bomapping->mmnode.start << PAGE_SHIFT); ret = -EINVAL; goto err_bo; } - WARN_ON(bo->mmu->as != as); + WARN_ON(bomapping->mmu->as != as);
/* Assume 2MB alignment and size multiple */ addr &= ~((u64)SZ_2M - 1); page_offset = addr >> PAGE_SHIFT; - page_offset -= bo->node.start; + page_offset -= bomapping->mmnode.start;
mutex_lock(&bo->base.pages_lock);
@@ -509,13 +517,14 @@ static int panfrost_mmu_map_fault_addr(s goto err_map; }
- mmu_map_sg(pfdev, bo->mmu, addr, IOMMU_WRITE | IOMMU_READ | IOMMU_NOEXEC, sgt); + mmu_map_sg(pfdev, bomapping->mmu, addr, + IOMMU_WRITE | IOMMU_READ | IOMMU_NOEXEC, sgt);
- bo->is_mapped = true; + bomapping->active = true;
dev_dbg(pfdev->dev, "mapped page fault @ AS%d %llx", as, addr);
- drm_gem_object_put_unlocked(&bo->base.base); + panfrost_gem_mapping_put(bomapping);
return 0;
--- a/drivers/gpu/drm/panfrost/panfrost_mmu.h +++ b/drivers/gpu/drm/panfrost/panfrost_mmu.h @@ -4,12 +4,12 @@ #ifndef __PANFROST_MMU_H__ #define __PANFROST_MMU_H__
-struct panfrost_gem_object; +struct panfrost_gem_mapping; struct panfrost_file_priv; struct panfrost_mmu;
-int panfrost_mmu_map(struct panfrost_gem_object *bo); -void panfrost_mmu_unmap(struct panfrost_gem_object *bo); +int panfrost_mmu_map(struct panfrost_gem_mapping *mapping); +void panfrost_mmu_unmap(struct panfrost_gem_mapping *mapping);
int panfrost_mmu_init(struct panfrost_device *pfdev); void panfrost_mmu_fini(struct panfrost_device *pfdev); --- a/drivers/gpu/drm/panfrost/panfrost_perfcnt.c +++ b/drivers/gpu/drm/panfrost/panfrost_perfcnt.c @@ -25,7 +25,7 @@ #define V4_SHADERS_PER_COREGROUP 4
struct panfrost_perfcnt { - struct panfrost_gem_object *bo; + struct panfrost_gem_mapping *mapping; size_t bosize; void *buf; struct panfrost_file_priv *user; @@ -49,7 +49,7 @@ static int panfrost_perfcnt_dump_locked( int ret;
reinit_completion(&pfdev->perfcnt->dump_comp); - gpuva = pfdev->perfcnt->bo->node.start << PAGE_SHIFT; + gpuva = pfdev->perfcnt->mapping->mmnode.start << PAGE_SHIFT; gpu_write(pfdev, GPU_PERFCNT_BASE_LO, gpuva); gpu_write(pfdev, GPU_PERFCNT_BASE_HI, gpuva >> 32); gpu_write(pfdev, GPU_INT_CLEAR, @@ -89,17 +89,22 @@ static int panfrost_perfcnt_enable_locke if (IS_ERR(bo)) return PTR_ERR(bo);
- perfcnt->bo = to_panfrost_bo(&bo->base); - /* Map the perfcnt buf in the address space attached to file_priv. */ - ret = panfrost_gem_open(&perfcnt->bo->base.base, file_priv); + ret = panfrost_gem_open(&bo->base, file_priv); if (ret) goto err_put_bo;
+ perfcnt->mapping = panfrost_gem_mapping_get(to_panfrost_bo(&bo->base), + user); + if (!perfcnt->mapping) { + ret = -EINVAL; + goto err_close_bo; + } + perfcnt->buf = drm_gem_shmem_vmap(&bo->base); if (IS_ERR(perfcnt->buf)) { ret = PTR_ERR(perfcnt->buf); - goto err_close_bo; + goto err_put_mapping; }
/* @@ -154,12 +159,17 @@ static int panfrost_perfcnt_enable_locke if (panfrost_has_hw_issue(pfdev, HW_ISSUE_8186)) gpu_write(pfdev, GPU_PRFCNT_TILER_EN, 0xffffffff);
+ /* The BO ref is retained by the mapping. */ + drm_gem_object_put_unlocked(&bo->base); + return 0;
err_vunmap: - drm_gem_shmem_vunmap(&perfcnt->bo->base.base, perfcnt->buf); + drm_gem_shmem_vunmap(&bo->base, perfcnt->buf); +err_put_mapping: + panfrost_gem_mapping_put(perfcnt->mapping); err_close_bo: - panfrost_gem_close(&perfcnt->bo->base.base, file_priv); + panfrost_gem_close(&bo->base, file_priv); err_put_bo: drm_gem_object_put_unlocked(&bo->base); return ret; @@ -182,11 +192,11 @@ static int panfrost_perfcnt_disable_lock GPU_PERFCNT_CFG_MODE(GPU_PERFCNT_CFG_MODE_OFF));
perfcnt->user = NULL; - drm_gem_shmem_vunmap(&perfcnt->bo->base.base, perfcnt->buf); + drm_gem_shmem_vunmap(&perfcnt->mapping->obj->base.base, perfcnt->buf); perfcnt->buf = NULL; - panfrost_gem_close(&perfcnt->bo->base.base, file_priv); - drm_gem_object_put_unlocked(&perfcnt->bo->base.base); - perfcnt->bo = NULL; + panfrost_gem_close(&perfcnt->mapping->obj->base.base, file_priv); + panfrost_gem_mapping_put(perfcnt->mapping); + perfcnt->mapping = NULL; pm_runtime_mark_last_busy(pfdev->dev); pm_runtime_put_autosuspend(pfdev->dev);
From: Tvrtko Ursulin tvrtko.ursulin@intel.com
commit 5eec71829ad7749a8c918f66a91a9bcf6fb4462a upstream.
In our ABI we have defined I915_ENGINE_CLASS_INVALID_NONE and I915_ENGINE_CLASS_INVALID_VIRTUAL as negative values which creates implicit coupling with type widths used in, also ABI, struct i915_engine_class_instance.
One place where we export engine->uabi_class I915_ENGINE_CLASS_INVALID_VIRTUAL is from our our tracepoints. Because the type of the former is u8 in contrast to u16 defined in the ABI, 254 will be returned instead of 65534 which userspace would legitimately expect.
Another place is I915_CONTEXT_PARAM_ENGINES.
Therefore we need to align the type used to store engine ABI class and instance.
v2: * Update the commit message mentioning get_engines and cc stable. (Chris)
Signed-off-by: Tvrtko Ursulin tvrtko.ursulin@intel.com Fixes: 6d06779e8672 ("drm/i915: Load balancing across a virtual engine") Cc: Chris Wilson chris@chris-wilson.co.uk Cc: stable@vger.kernel.org # v5.3+ Reviewed-by: Chris Wilson chris@chris-wilson.co.uk Link: https://patchwork.freedesktop.org/patch/msgid/20200116134508.25211-1-tvrtko.... (cherry picked from commit 0b3bd0cdc329a1e2e00995cffd61aacf58c87cb4) Signed-off-by: Joonas Lahtinen joonas.lahtinen@linux.intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/gpu/drm/i915/gem/i915_gem_busy.c | 12 ++++++------ drivers/gpu/drm/i915/gt/intel_engine_types.h | 4 ++-- 2 files changed, 8 insertions(+), 8 deletions(-)
--- a/drivers/gpu/drm/i915/gem/i915_gem_busy.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_busy.c @@ -9,16 +9,16 @@ #include "i915_gem_ioctls.h" #include "i915_gem_object.h"
-static __always_inline u32 __busy_read_flag(u8 id) +static __always_inline u32 __busy_read_flag(u16 id) { - if (id == (u8)I915_ENGINE_CLASS_INVALID) + if (id == (u16)I915_ENGINE_CLASS_INVALID) return 0xffff0000u;
GEM_BUG_ON(id >= 16); return 0x10000u << id; }
-static __always_inline u32 __busy_write_id(u8 id) +static __always_inline u32 __busy_write_id(u16 id) { /* * The uABI guarantees an active writer is also amongst the read @@ -29,14 +29,14 @@ static __always_inline u32 __busy_write_ * last_read - hence we always set both read and write busy for * last_write. */ - if (id == (u8)I915_ENGINE_CLASS_INVALID) + if (id == (u16)I915_ENGINE_CLASS_INVALID) return 0xffffffffu;
return (id + 1) | __busy_read_flag(id); }
static __always_inline unsigned int -__busy_set_if_active(const struct dma_fence *fence, u32 (*flag)(u8 id)) +__busy_set_if_active(const struct dma_fence *fence, u32 (*flag)(u16 id)) { const struct i915_request *rq;
@@ -57,7 +57,7 @@ __busy_set_if_active(const struct dma_fe return 0;
/* Beware type-expansion follies! */ - BUILD_BUG_ON(!typecheck(u8, rq->engine->uabi_class)); + BUILD_BUG_ON(!typecheck(u16, rq->engine->uabi_class)); return flag(rq->engine->uabi_class); }
--- a/drivers/gpu/drm/i915/gt/intel_engine_types.h +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h @@ -300,8 +300,8 @@ struct intel_engine_cs { u8 class; u8 instance;
- u8 uabi_class; - u8 uabi_instance; + u16 uabi_class; + u16 uabi_instance;
u32 context_size; u32 mmio_base;
From: Alexander Potapenko glider@google.com
commit 18451f9f9e5810b8bd1245c5ae166f257e0e2b9d upstream.
Upon resuming from hibernation, free pages may contain stale data from the kernel that initiated the resume. This breaks the invariant inflicted by init_on_free=1 that freed pages must be zeroed.
To deal with this problem, make clear_free_pages() also clear the free pages when init_on_free is enabled.
Fixes: 6471384af2a6 ("mm: security: introduce init_on_alloc=1 and init_on_free=1 boot options") Reported-by: Johannes Stezenbach js@sig21.net Signed-off-by: Alexander Potapenko glider@google.com Cc: 5.3+ stable@vger.kernel.org # 5.3+ Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- kernel/power/snapshot.c | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-)
--- a/kernel/power/snapshot.c +++ b/kernel/power/snapshot.c @@ -1147,24 +1147,24 @@ void free_basic_memory_bitmaps(void)
void clear_free_pages(void) { -#ifdef CONFIG_PAGE_POISONING_ZERO struct memory_bitmap *bm = free_pages_map; unsigned long pfn;
if (WARN_ON(!(free_pages_map))) return;
- memory_bm_position_reset(bm); - pfn = memory_bm_next_pfn(bm); - while (pfn != BM_END_OF_MAP) { - if (pfn_valid(pfn)) - clear_highpage(pfn_to_page(pfn)); - + if (IS_ENABLED(CONFIG_PAGE_POISONING_ZERO) || want_init_on_free()) { + memory_bm_position_reset(bm); pfn = memory_bm_next_pfn(bm); + while (pfn != BM_END_OF_MAP) { + if (pfn_valid(pfn)) + clear_highpage(pfn_to_page(pfn)); + + pfn = memory_bm_next_pfn(bm); + } + memory_bm_position_reset(bm); + pr_info("free pages cleared after restore\n"); } - memory_bm_position_reset(bm); - pr_info("free pages cleared after restore\n"); -#endif /* PAGE_POISONING_ZERO */ }
/**
From: Masami Hiramatsu mhiramat@kernel.org
commit aeed8aa3874dc15b9d82a6fe796fd7cfbb684448 upstream.
With CONFIG_PROVE_RCU_LIST, I had many suspicious RCU warnings when I ran ftracetest trigger testcases.
----- # dmesg -c > /dev/null # ./ftracetest test.d/trigger ... # dmesg | grep "RCU-list traversed" | cut -f 2 -d ] | cut -f 2 -d " " kernel/trace/trace_events_hist.c:6070 kernel/trace/trace_events_hist.c:1760 kernel/trace/trace_events_hist.c:5911 kernel/trace/trace_events_trigger.c:504 kernel/trace/trace_events_hist.c:1810 kernel/trace/trace_events_hist.c:3158 kernel/trace/trace_events_hist.c:3105 kernel/trace/trace_events_hist.c:5518 kernel/trace/trace_events_hist.c:5998 kernel/trace/trace_events_hist.c:6019 kernel/trace/trace_events_hist.c:6044 kernel/trace/trace_events_trigger.c:1500 kernel/trace/trace_events_trigger.c:1540 kernel/trace/trace_events_trigger.c:539 kernel/trace/trace_events_trigger.c:584 -----
I investigated those warnings and found that the RCU-list traversals in event trigger and hist didn't need to use RCU version because those were called only under event_mutex.
I also checked other RCU-list traversals related to event trigger list, and found that most of them were called from event_hist_trigger_func() or hist_unregister_trigger() or register/unregister functions except for a few cases.
Replace these unneeded RCU-list traversals with normal list traversal macro and lockdep_assert_held() to check the event_mutex is held.
Link: http://lkml.kernel.org/r/157680910305.11685.15110237954275915782.stgit@devno...
Cc: stable@vger.kernel.org Fixes: 30350d65ac567 ("tracing: Add variable support to hist triggers") Reviewed-by: Tom Zanussi zanussi@kernel.org Signed-off-by: Masami Hiramatsu mhiramat@kernel.org Signed-off-by: Steven Rostedt (VMware) rostedt@goodmis.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- kernel/trace/trace_events_hist.c | 41 ++++++++++++++++++++++++++---------- kernel/trace/trace_events_trigger.c | 20 +++++++++++++---- 2 files changed, 45 insertions(+), 16 deletions(-)
--- a/kernel/trace/trace_events_hist.c +++ b/kernel/trace/trace_events_hist.c @@ -1766,11 +1766,13 @@ static struct hist_field *find_var(struc struct event_trigger_data *test; struct hist_field *hist_field;
+ lockdep_assert_held(&event_mutex); + hist_field = find_var_field(hist_data, var_name); if (hist_field) return hist_field;
- list_for_each_entry_rcu(test, &file->triggers, list) { + list_for_each_entry(test, &file->triggers, list) { if (test->cmd_ops->trigger_type == ETT_EVENT_HIST) { test_data = test->private_data; hist_field = find_var_field(test_data, var_name); @@ -1820,7 +1822,9 @@ static struct hist_field *find_file_var( struct event_trigger_data *test; struct hist_field *hist_field;
- list_for_each_entry_rcu(test, &file->triggers, list) { + lockdep_assert_held(&event_mutex); + + list_for_each_entry(test, &file->triggers, list) { if (test->cmd_ops->trigger_type == ETT_EVENT_HIST) { test_data = test->private_data; hist_field = find_var_field(test_data, var_name); @@ -3115,7 +3119,9 @@ static char *find_trigger_filter(struct { struct event_trigger_data *test;
- list_for_each_entry_rcu(test, &file->triggers, list) { + lockdep_assert_held(&event_mutex); + + list_for_each_entry(test, &file->triggers, list) { if (test->cmd_ops->trigger_type == ETT_EVENT_HIST) { if (test->private_data == hist_data) return test->filter_str; @@ -3166,9 +3172,11 @@ find_compatible_hist(struct hist_trigger struct event_trigger_data *test; unsigned int n_keys;
+ lockdep_assert_held(&event_mutex); + n_keys = target_hist_data->n_fields - target_hist_data->n_vals;
- list_for_each_entry_rcu(test, &file->triggers, list) { + list_for_each_entry(test, &file->triggers, list) { if (test->cmd_ops->trigger_type == ETT_EVENT_HIST) { hist_data = test->private_data;
@@ -5528,7 +5536,7 @@ static int hist_show(struct seq_file *m, goto out_unlock; }
- list_for_each_entry_rcu(data, &event_file->triggers, list) { + list_for_each_entry(data, &event_file->triggers, list) { if (data->cmd_ops->trigger_type == ETT_EVENT_HIST) hist_trigger_show(m, data, n++); } @@ -5921,7 +5929,9 @@ static int hist_register_trigger(char *g if (hist_data->attrs->name && !named_data) goto new;
- list_for_each_entry_rcu(test, &file->triggers, list) { + lockdep_assert_held(&event_mutex); + + list_for_each_entry(test, &file->triggers, list) { if (test->cmd_ops->trigger_type == ETT_EVENT_HIST) { if (!hist_trigger_match(data, test, named_data, false)) continue; @@ -6005,10 +6015,12 @@ static bool have_hist_trigger_match(stru struct event_trigger_data *test, *named_data = NULL; bool match = false;
+ lockdep_assert_held(&event_mutex); + if (hist_data->attrs->name) named_data = find_named_trigger(hist_data->attrs->name);
- list_for_each_entry_rcu(test, &file->triggers, list) { + list_for_each_entry(test, &file->triggers, list) { if (test->cmd_ops->trigger_type == ETT_EVENT_HIST) { if (hist_trigger_match(data, test, named_data, false)) { match = true; @@ -6026,10 +6038,12 @@ static bool hist_trigger_check_refs(stru struct hist_trigger_data *hist_data = data->private_data; struct event_trigger_data *test, *named_data = NULL;
+ lockdep_assert_held(&event_mutex); + if (hist_data->attrs->name) named_data = find_named_trigger(hist_data->attrs->name);
- list_for_each_entry_rcu(test, &file->triggers, list) { + list_for_each_entry(test, &file->triggers, list) { if (test->cmd_ops->trigger_type == ETT_EVENT_HIST) { if (!hist_trigger_match(data, test, named_data, false)) continue; @@ -6051,10 +6065,12 @@ static void hist_unregister_trigger(char struct event_trigger_data *test, *named_data = NULL; bool unregistered = false;
+ lockdep_assert_held(&event_mutex); + if (hist_data->attrs->name) named_data = find_named_trigger(hist_data->attrs->name);
- list_for_each_entry_rcu(test, &file->triggers, list) { + list_for_each_entry(test, &file->triggers, list) { if (test->cmd_ops->trigger_type == ETT_EVENT_HIST) { if (!hist_trigger_match(data, test, named_data, false)) continue; @@ -6080,7 +6096,9 @@ static bool hist_file_check_refs(struct struct hist_trigger_data *hist_data; struct event_trigger_data *test;
- list_for_each_entry_rcu(test, &file->triggers, list) { + lockdep_assert_held(&event_mutex); + + list_for_each_entry(test, &file->triggers, list) { if (test->cmd_ops->trigger_type == ETT_EVENT_HIST) { hist_data = test->private_data; if (check_var_refs(hist_data)) @@ -6323,7 +6341,8 @@ hist_enable_trigger(struct event_trigger struct enable_trigger_data *enable_data = data->private_data; struct event_trigger_data *test;
- list_for_each_entry_rcu(test, &enable_data->file->triggers, list) { + list_for_each_entry_rcu(test, &enable_data->file->triggers, list, + lockdep_is_held(&event_mutex)) { if (test->cmd_ops->trigger_type == ETT_EVENT_HIST) { if (enable_data->enable) test->paused = false; --- a/kernel/trace/trace_events_trigger.c +++ b/kernel/trace/trace_events_trigger.c @@ -501,7 +501,9 @@ void update_cond_flag(struct trace_event struct event_trigger_data *data; bool set_cond = false;
- list_for_each_entry_rcu(data, &file->triggers, list) { + lockdep_assert_held(&event_mutex); + + list_for_each_entry(data, &file->triggers, list) { if (data->filter || event_command_post_trigger(data->cmd_ops) || event_command_needs_rec(data->cmd_ops)) { set_cond = true; @@ -536,7 +538,9 @@ static int register_trigger(char *glob, struct event_trigger_data *test; int ret = 0;
- list_for_each_entry_rcu(test, &file->triggers, list) { + lockdep_assert_held(&event_mutex); + + list_for_each_entry(test, &file->triggers, list) { if (test->cmd_ops->trigger_type == data->cmd_ops->trigger_type) { ret = -EEXIST; goto out; @@ -581,7 +585,9 @@ static void unregister_trigger(char *glo struct event_trigger_data *data; bool unregistered = false;
- list_for_each_entry_rcu(data, &file->triggers, list) { + lockdep_assert_held(&event_mutex); + + list_for_each_entry(data, &file->triggers, list) { if (data->cmd_ops->trigger_type == test->cmd_ops->trigger_type) { unregistered = true; list_del_rcu(&data->list); @@ -1497,7 +1503,9 @@ int event_enable_register_trigger(char * struct event_trigger_data *test; int ret = 0;
- list_for_each_entry_rcu(test, &file->triggers, list) { + lockdep_assert_held(&event_mutex); + + list_for_each_entry(test, &file->triggers, list) { test_enable_data = test->private_data; if (test_enable_data && (test->cmd_ops->trigger_type == @@ -1537,7 +1545,9 @@ void event_enable_unregister_trigger(cha struct event_trigger_data *data; bool unregistered = false;
- list_for_each_entry_rcu(data, &file->triggers, list) { + lockdep_assert_held(&event_mutex); + + list_for_each_entry(data, &file->triggers, list) { enable_data = data->private_data; if (enable_data && (data->cmd_ops->trigger_type ==
From: Masami Hiramatsu mhiramat@kernel.org
commit 99c9a923e97a583a38050baa92c9377d73946330 upstream.
Fix double perf_event linking to trace_uprobe_filter on multiple uprobe event by moving trace_uprobe_filter under trace_probe_event.
In uprobe perf event, trace_uprobe_filter data structure is managing target mm filters (in perf_event) related to each uprobe event.
Since commit 60d53e2c3b75 ("tracing/probe: Split trace_event related data from trace_probe") left the trace_uprobe_filter data structure in trace_uprobe, if a trace_probe_event has multiple trace_uprobe (multi-probe event), a perf_event is added to different trace_uprobe_filter on each trace_uprobe. This leads a linked list corruption.
To fix this issue, move trace_uprobe_filter to trace_probe_event and link it once on each event instead of each probe.
Link: http://lkml.kernel.org/r/157862073931.1800.3800576241181489174.stgit@devnote...
Cc: Jiri Olsa jolsa@kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: Ingo Molnar mingo@kernel.org Cc: "Naveen N . Rao" naveen.n.rao@linux.ibm.com Cc: Anil S Keshavamurthy anil.s.keshavamurthy@intel.com Cc: "David S . Miller" davem@davemloft.net Cc: Namhyung Kim namhyung@kernel.org Cc: =?utf-8?q?Toke_H=C3=B8iland-J?= =?utf-8?b?w7hyZ2Vuc2Vu?= thoiland@redhat.com Cc: Jean-Tsung Hsiao jhsiao@redhat.com Cc: Jesper Dangaard Brouer brouer@redhat.com Cc: stable@vger.kernel.org Fixes: 60d53e2c3b75 ("tracing/probe: Split trace_event related data from trace_probe") Link: https://lkml.kernel.org/r/20200108171611.GA8472@kernel.org Reported-by: Arnaldo Carvalho de Melo acme@kernel.org Tested-by: Arnaldo Carvalho de Melo acme@redhat.com Signed-off-by: Masami Hiramatsu mhiramat@kernel.org Signed-off-by: Steven Rostedt (VMware) rostedt@goodmis.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- kernel/trace/trace_kprobe.c | 2 kernel/trace/trace_probe.c | 5 + kernel/trace/trace_probe.h | 3 - kernel/trace/trace_uprobe.c | 124 ++++++++++++++++++++++++++++---------------- 4 files changed, 86 insertions(+), 48 deletions(-)
--- a/kernel/trace/trace_kprobe.c +++ b/kernel/trace/trace_kprobe.c @@ -290,7 +290,7 @@ static struct trace_kprobe *alloc_trace_ INIT_HLIST_NODE(&tk->rp.kp.hlist); INIT_LIST_HEAD(&tk->rp.kp.list);
- ret = trace_probe_init(&tk->tp, event, group); + ret = trace_probe_init(&tk->tp, event, group, 0); if (ret < 0) goto error;
--- a/kernel/trace/trace_probe.c +++ b/kernel/trace/trace_probe.c @@ -984,7 +984,7 @@ void trace_probe_cleanup(struct trace_pr }
int trace_probe_init(struct trace_probe *tp, const char *event, - const char *group) + const char *group, size_t event_data_size) { struct trace_event_call *call; int ret = 0; @@ -992,7 +992,8 @@ int trace_probe_init(struct trace_probe if (!event || !group) return -EINVAL;
- tp->event = kzalloc(sizeof(struct trace_probe_event), GFP_KERNEL); + tp->event = kzalloc(sizeof(struct trace_probe_event) + event_data_size, + GFP_KERNEL); if (!tp->event) return -ENOMEM;
--- a/kernel/trace/trace_probe.h +++ b/kernel/trace/trace_probe.h @@ -230,6 +230,7 @@ struct trace_probe_event { struct trace_event_call call; struct list_head files; struct list_head probes; + char data[0]; };
struct trace_probe { @@ -322,7 +323,7 @@ static inline bool trace_probe_has_singl }
int trace_probe_init(struct trace_probe *tp, const char *event, - const char *group); + const char *group, size_t event_data_size); void trace_probe_cleanup(struct trace_probe *tp); int trace_probe_append(struct trace_probe *tp, struct trace_probe *to); void trace_probe_unlink(struct trace_probe *tp); --- a/kernel/trace/trace_uprobe.c +++ b/kernel/trace/trace_uprobe.c @@ -60,7 +60,6 @@ static struct dyn_event_operations trace */ struct trace_uprobe { struct dyn_event devent; - struct trace_uprobe_filter filter; struct uprobe_consumer consumer; struct path path; struct inode *inode; @@ -264,6 +263,14 @@ process_fetch_insn(struct fetch_insn *co } NOKPROBE_SYMBOL(process_fetch_insn)
+static struct trace_uprobe_filter * +trace_uprobe_get_filter(struct trace_uprobe *tu) +{ + struct trace_probe_event *event = tu->tp.event; + + return (struct trace_uprobe_filter *)&event->data[0]; +} + static inline void init_trace_uprobe_filter(struct trace_uprobe_filter *filter) { rwlock_init(&filter->rwlock); @@ -351,7 +358,8 @@ alloc_trace_uprobe(const char *group, co if (!tu) return ERR_PTR(-ENOMEM);
- ret = trace_probe_init(&tu->tp, event, group); + ret = trace_probe_init(&tu->tp, event, group, + sizeof(struct trace_uprobe_filter)); if (ret < 0) goto error;
@@ -359,7 +367,7 @@ alloc_trace_uprobe(const char *group, co tu->consumer.handler = uprobe_dispatcher; if (is_ret) tu->consumer.ret_handler = uretprobe_dispatcher; - init_trace_uprobe_filter(&tu->filter); + init_trace_uprobe_filter(trace_uprobe_get_filter(tu)); return tu;
error: @@ -1067,13 +1075,14 @@ static void __probe_event_disable(struct struct trace_probe *pos; struct trace_uprobe *tu;
+ tu = container_of(tp, struct trace_uprobe, tp); + WARN_ON(!uprobe_filter_is_empty(trace_uprobe_get_filter(tu))); + list_for_each_entry(pos, trace_probe_probe_list(tp), list) { tu = container_of(pos, struct trace_uprobe, tp); if (!tu->inode) continue;
- WARN_ON(!uprobe_filter_is_empty(&tu->filter)); - uprobe_unregister(tu->inode, tu->offset, &tu->consumer); tu->inode = NULL; } @@ -1108,7 +1117,7 @@ static int probe_event_enable(struct tra }
tu = container_of(tp, struct trace_uprobe, tp); - WARN_ON(!uprobe_filter_is_empty(&tu->filter)); + WARN_ON(!uprobe_filter_is_empty(trace_uprobe_get_filter(tu)));
if (enabled) return 0; @@ -1205,39 +1214,39 @@ __uprobe_perf_filter(struct trace_uprobe }
static inline bool -uprobe_filter_event(struct trace_uprobe *tu, struct perf_event *event) +trace_uprobe_filter_event(struct trace_uprobe_filter *filter, + struct perf_event *event) { - return __uprobe_perf_filter(&tu->filter, event->hw.target->mm); + return __uprobe_perf_filter(filter, event->hw.target->mm); }
-static int uprobe_perf_close(struct trace_uprobe *tu, struct perf_event *event) +static bool trace_uprobe_filter_remove(struct trace_uprobe_filter *filter, + struct perf_event *event) { bool done;
- write_lock(&tu->filter.rwlock); + write_lock(&filter->rwlock); if (event->hw.target) { list_del(&event->hw.tp_list); - done = tu->filter.nr_systemwide || + done = filter->nr_systemwide || (event->hw.target->flags & PF_EXITING) || - uprobe_filter_event(tu, event); + trace_uprobe_filter_event(filter, event); } else { - tu->filter.nr_systemwide--; - done = tu->filter.nr_systemwide; + filter->nr_systemwide--; + done = filter->nr_systemwide; } - write_unlock(&tu->filter.rwlock); - - if (!done) - return uprobe_apply(tu->inode, tu->offset, &tu->consumer, false); + write_unlock(&filter->rwlock);
- return 0; + return done; }
-static int uprobe_perf_open(struct trace_uprobe *tu, struct perf_event *event) +/* This returns true if the filter always covers target mm */ +static bool trace_uprobe_filter_add(struct trace_uprobe_filter *filter, + struct perf_event *event) { bool done; - int err;
- write_lock(&tu->filter.rwlock); + write_lock(&filter->rwlock); if (event->hw.target) { /* * event->parent != NULL means copy_process(), we can avoid @@ -1247,28 +1256,21 @@ static int uprobe_perf_open(struct trace * attr.enable_on_exec means that exec/mmap will install the * breakpoints we need. */ - done = tu->filter.nr_systemwide || + done = filter->nr_systemwide || event->parent || event->attr.enable_on_exec || - uprobe_filter_event(tu, event); - list_add(&event->hw.tp_list, &tu->filter.perf_events); + trace_uprobe_filter_event(filter, event); + list_add(&event->hw.tp_list, &filter->perf_events); } else { - done = tu->filter.nr_systemwide; - tu->filter.nr_systemwide++; + done = filter->nr_systemwide; + filter->nr_systemwide++; } - write_unlock(&tu->filter.rwlock); + write_unlock(&filter->rwlock);
- err = 0; - if (!done) { - err = uprobe_apply(tu->inode, tu->offset, &tu->consumer, true); - if (err) - uprobe_perf_close(tu, event); - } - return err; + return done; }
-static int uprobe_perf_multi_call(struct trace_event_call *call, - struct perf_event *event, - int (*op)(struct trace_uprobe *tu, struct perf_event *event)) +static int uprobe_perf_close(struct trace_event_call *call, + struct perf_event *event) { struct trace_probe *pos, *tp; struct trace_uprobe *tu; @@ -1278,25 +1280,59 @@ static int uprobe_perf_multi_call(struct if (WARN_ON_ONCE(!tp)) return -ENODEV;
+ tu = container_of(tp, struct trace_uprobe, tp); + if (trace_uprobe_filter_remove(trace_uprobe_get_filter(tu), event)) + return 0; + list_for_each_entry(pos, trace_probe_probe_list(tp), list) { tu = container_of(pos, struct trace_uprobe, tp); - ret = op(tu, event); + ret = uprobe_apply(tu->inode, tu->offset, &tu->consumer, false); if (ret) break; }
return ret; } + +static int uprobe_perf_open(struct trace_event_call *call, + struct perf_event *event) +{ + struct trace_probe *pos, *tp; + struct trace_uprobe *tu; + int err = 0; + + tp = trace_probe_primary_from_call(call); + if (WARN_ON_ONCE(!tp)) + return -ENODEV; + + tu = container_of(tp, struct trace_uprobe, tp); + if (trace_uprobe_filter_add(trace_uprobe_get_filter(tu), event)) + return 0; + + list_for_each_entry(pos, trace_probe_probe_list(tp), list) { + err = uprobe_apply(tu->inode, tu->offset, &tu->consumer, true); + if (err) { + uprobe_perf_close(call, event); + break; + } + } + + return err; +} + static bool uprobe_perf_filter(struct uprobe_consumer *uc, enum uprobe_filter_ctx ctx, struct mm_struct *mm) { + struct trace_uprobe_filter *filter; struct trace_uprobe *tu; int ret;
tu = container_of(uc, struct trace_uprobe, consumer); - read_lock(&tu->filter.rwlock); - ret = __uprobe_perf_filter(&tu->filter, mm); - read_unlock(&tu->filter.rwlock); + filter = trace_uprobe_get_filter(tu); + + read_lock(&filter->rwlock); + ret = __uprobe_perf_filter(filter, mm); + read_unlock(&filter->rwlock);
return ret; } @@ -1419,10 +1455,10 @@ trace_uprobe_register(struct trace_event return 0;
case TRACE_REG_PERF_OPEN: - return uprobe_perf_multi_call(event, data, uprobe_perf_open); + return uprobe_perf_open(event, data);
case TRACE_REG_PERF_CLOSE: - return uprobe_perf_multi_call(event, data, uprobe_perf_close); + return uprobe_perf_close(event, data);
#endif default:
From: Masami Ichikawa masami256@gmail.com
commit bf24daac8f2bd5b8affaec03c2be1d20bcdd6837 upstream.
When trace_clock option is not set and unstable clcok detected, tracing_set_default_clock() sets trace_clock(ThinkPad A285 is one of case). In that case, if lockdown is in effect, null pointer dereference error happens in ring_buffer_set_clock().
Link: http://lkml.kernel.org/r/20200116131236.3866925-1-masami256@gmail.com
Cc: stable@vger.kernel.org Fixes: 17911ff38aa58 ("tracing: Add locked_down checks to the open calls of files created for tracefs") Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1788488 Signed-off-by: Masami Ichikawa masami256@gmail.com Signed-off-by: Steven Rostedt (VMware) rostedt@goodmis.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- kernel/trace/trace.c | 5 +++++ 1 file changed, 5 insertions(+)
--- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -9270,6 +9270,11 @@ __init static int tracing_set_default_cl { /* sched_clock_stable() is determined in late_initcall */ if (!trace_boot_clock && !sched_clock_stable()) { + if (security_locked_down(LOCKDOWN_TRACEFS)) { + pr_warn("Can not set tracing clock due to lockdown\n"); + return -EPERM; + } + printk(KERN_WARNING "Unstable clock detected, switching default tracing clock to "global"\n" "If you want to keep using the local clock, then add:\n"
From: Steven Rostedt (VMware) rostedt@goodmis.org
commit 8bcebc77e85f3d7536f96845a0fe94b1dddb6af0 upstream.
While working on a tool to convert SQL syntex into the histogram language of the kernel, I discovered the following bug:
# echo 'first u64 start_time u64 end_time pid_t pid u64 delta' >> synthetic_events # echo 'hist:keys=pid:start=common_timestamp' > events/sched/sched_waking/trigger # echo 'hist:keys=next_pid:delta=common_timestamp-$start,start2=$start:onmatch(sched.sched_waking).trace(first,$start2,common_timestamp,next_pid,$delta)' > events/sched/sched_switch/trigger
Would not display any histograms in the sched_switch histogram side.
But if I were to swap the location of
"delta=common_timestamp-$start" with "start2=$start"
Such that the last line had:
# echo 'hist:keys=next_pid:start2=$start,delta=common_timestamp-$start:onmatch(sched.sched_waking).trace(first,$start2,common_timestamp,next_pid,$delta)' > events/sched/sched_switch/trigger
The histogram works as expected.
What I found out is that the expressions clear out the value once it is resolved. As the variables are resolved in the order listed, when processing:
delta=common_timestamp-$start
The $start is cleared. When it gets to "start2=$start", it errors out with "unresolved symbol" (which is silent as this happens at the location of the trace), and the histogram is dropped.
When processing the histogram for variable references, instead of adding a new reference for a variable used twice, use the same reference. That way, not only is it more efficient, but the order will no longer matter in processing of the variables.
From Tom Zanussi:
"Just to clarify some more about what the problem was is that without your patch, we would have two separate references to the same variable, and during resolve_var_refs(), they'd both want to be resolved separately, so in this case, since the first reference to start wasn't part of an expression, it wouldn't get the read-once flag set, so would be read normally, and then the second reference would do the read-once read and also be read but using read-once. So everything worked and you didn't see a problem:
from: start2=$start,delta=common_timestamp-$start
In the second case, when you switched them around, the first reference would be resolved by doing the read-once, and following that the second reference would try to resolve and see that the variable had already been read, so failed as unset, which caused it to short-circuit out and not do the trigger action to generate the synthetic event:
to: delta=common_timestamp-$start,start2=$start
With your patch, we only have the single resolution which happens correctly the one time it's resolved, so this can't happen."
Link: https://lore.kernel.org/r/20200116154216.58ca08eb@gandalf.local.home
Cc: stable@vger.kernel.org Fixes: 067fe038e70f6 ("tracing: Add variable reference handling to hist triggers") Reviewed-by: Tom Zanuss zanussi@kernel.org Tested-by: Tom Zanussi zanussi@kernel.org Signed-off-by: Steven Rostedt (VMware) rostedt@goodmis.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- kernel/trace/trace_events_hist.c | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+)
--- a/kernel/trace/trace_events_hist.c +++ b/kernel/trace/trace_events_hist.c @@ -116,6 +116,7 @@ struct hist_field { struct ftrace_event_field *field; unsigned long flags; hist_field_fn_t fn; + unsigned int ref; unsigned int size; unsigned int offset; unsigned int is_signed; @@ -2427,8 +2428,16 @@ static int contains_operator(char *str) return field_op; }
+static void get_hist_field(struct hist_field *hist_field) +{ + hist_field->ref++; +} + static void __destroy_hist_field(struct hist_field *hist_field) { + if (--hist_field->ref > 1) + return; + kfree(hist_field->var.name); kfree(hist_field->name); kfree(hist_field->type); @@ -2470,6 +2479,8 @@ static struct hist_field *create_hist_fi if (!hist_field) return NULL;
+ hist_field->ref = 1; + hist_field->hist_data = hist_data;
if (flags & HIST_FIELD_FL_EXPR || flags & HIST_FIELD_FL_ALIAS) @@ -2665,6 +2676,17 @@ static struct hist_field *create_var_ref { unsigned long flags = HIST_FIELD_FL_VAR_REF; struct hist_field *ref_field; + int i; + + /* Check if the variable already exists */ + for (i = 0; i < hist_data->n_var_refs; i++) { + ref_field = hist_data->var_refs[i]; + if (ref_field->var.idx == var_field->var.idx && + ref_field->var.hist_data == var_field->hist_data) { + get_hist_field(ref_field); + return ref_field; + } + }
ref_field = create_hist_field(var_field->hist_data, NULL, flags, NULL); if (ref_field) {
From: Aneesh Kumar K.V aneesh.kumar@linux.ibm.com
commit 5d2e5dd5849b4ef5e8ec35e812cdb732c13cd27e upstream.
Commit 0034d395f89d ("powerpc/mm/hash64: Map all the kernel regions in the same 0xc range") has a bug in the definition of MIN_USER_CONTEXT.
The result is that the context id used for the vmemmap and the lowest context id handed out to userspace are the same. The context id is essentially the process identifier as far as the first stage of the MMU translation is concerned.
This can result in multiple SLB entries with the same VSID (Virtual Segment ID), accessible to the kernel and some random userspace process that happens to get the overlapping id, which is not expected eg:
07 c00c000008000000 40066bdea7000500 1T ESID= c00c00 VSID= 66bdea7 LLP:100 12 0002000008000000 40066bdea7000d80 1T ESID= 200 VSID= 66bdea7 LLP:100
Even though the user process and the kernel use the same VSID, the permissions in the hash page table prevent the user process from reading or writing to any kernel mappings.
It can also lead to SLB entries with different base page size encodings (LLP), eg:
05 c00c000008000000 00006bde0053b500 256M ESID=c00c00000 VSID= 6bde0053b LLP:100 09 0000000008000000 00006bde0053bc80 256M ESID= 0 VSID= 6bde0053b LLP: 0
Such SLB entries can result in machine checks, eg. as seen on a G5:
Oops: Machine check, sig: 7 [#1] BE PAGE SIZE=64K MU-Hash SMP NR_CPUS=4 NUMA Power Mac NIP: c00000000026f248 LR: c000000000295e58 CTR: 0000000000000000 REGS: c0000000erfd3d70 TRAP: 0200 Tainted: G M (5.5.0-rcl-gcc-8.2.0-00010-g228b667d8ea1) MSR: 9000000000109032 <SF,HV,EE,ME,IR,DR,RI> CR: 24282048 XER: 00000000 DAR: c00c000000612c80 DSISR: 00000400 IRQMASK: 0 ... NIP [c00000000026f248] .kmem_cache_free+0x58/0x140 LR [c088000008295e58] .putname 8x88/0xa Call Trace: .putname+0xB8/0xa .filename_lookup.part.76+0xbe/0x160 .do_faccessat+0xe0/0x380 system_call+0x5c/ex68
This happens with 256MB segments and 64K pages, as the duplicate VSID is hit with the first vmemmap segment and the first user segment, and older 32-bit userspace maps things in the first user segment.
On other CPUs a machine check is not seen. Instead the userspace process can get stuck continuously faulting, with the fault never properly serviced, due to the kernel not understanding that there is already a HPTE for the address but with inaccessible permissions.
On machines with 1T segments we've not seen the bug hit other than by deliberately exercising it. That seems to be just a matter of luck though, due to the typical layout of the user virtual address space and the ranges of vmemmap that are typically populated.
To fix it we add 2 to MIN_USER_CONTEXT. This ensures the lowest context given to userspace doesn't overlap with the VMEMMAP context, or with the context for INVALID_REGION_ID.
Fixes: 0034d395f89d ("powerpc/mm/hash64: Map all the kernel regions in the same 0xc range") Cc: stable@vger.kernel.org # v5.2+ Reported-by: Christian Marillat marillat@debian.org Reported-by: Romain Dolbeau romain@dolbeau.org Signed-off-by: Aneesh Kumar K.V aneesh.kumar@linux.ibm.com [mpe: Account for INVALID_REGION_ID, mostly rewrite change log] Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://lore.kernel.org/r/20200123102547.11623-1-mpe@ellerman.id.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/powerpc/include/asm/book3s/64/mmu-hash.h | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
--- a/arch/powerpc/include/asm/book3s/64/mmu-hash.h +++ b/arch/powerpc/include/asm/book3s/64/mmu-hash.h @@ -600,8 +600,11 @@ extern void slb_set_size(u16 size); * */ #define MAX_USER_CONTEXT ((ASM_CONST(1) << CONTEXT_BITS) - 2) + +// The + 2 accounts for INVALID_REGION and 1 more to avoid overlap with kernel #define MIN_USER_CONTEXT (MAX_KERNEL_CTX_CNT + MAX_VMALLOC_CTX_CNT + \ - MAX_IO_CTX_CNT + MAX_VMEMMAP_CTX_CNT) + MAX_IO_CTX_CNT + MAX_VMEMMAP_CTX_CNT + 2) + /* * For platforms that support on 65bit VA we limit the context bits */
From: Frederic Barrat fbarrat@linux.ibm.com
commit 17328f218fb760c9c6accc5b52494889243a6b98 upstream.
A load on an ESB page returning all 1's means that the underlying device has invalidated the access to the PQ state of the interrupt through mmio. It may happen, for example when querying a PHB interrupt while the PHB is in an error state.
In that case, we should consider the interrupt to be invalid when checking its state in the irq_get_irqchip_state() handler.
Fixes: da15c03b047d ("powerpc/xive: Implement get_irqchip_state method for XIVE to fix shutdown race") Cc: stable@vger.kernel.org # v5.4+ Signed-off-by: Frederic Barrat fbarrat@linux.ibm.com [clg: wrote a commit log, introduced XIVE_ESB_INVALID ] Signed-off-by: Cédric Le Goater clg@kaod.org Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://lore.kernel.org/r/20200113130118.27969-1-clg@kaod.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/powerpc/include/asm/xive-regs.h | 1 + arch/powerpc/sysdev/xive/common.c | 15 ++++++++++++--- 2 files changed, 13 insertions(+), 3 deletions(-)
--- a/arch/powerpc/include/asm/xive-regs.h +++ b/arch/powerpc/include/asm/xive-regs.h @@ -39,6 +39,7 @@
#define XIVE_ESB_VAL_P 0x2 #define XIVE_ESB_VAL_Q 0x1 +#define XIVE_ESB_INVALID 0xFF
/* * Thread Management (aka "TM") registers --- a/arch/powerpc/sysdev/xive/common.c +++ b/arch/powerpc/sysdev/xive/common.c @@ -972,12 +972,21 @@ static int xive_get_irqchip_state(struct enum irqchip_irq_state which, bool *state) { struct xive_irq_data *xd = irq_data_get_irq_handler_data(data); + u8 pq;
switch (which) { case IRQCHIP_STATE_ACTIVE: - *state = !xd->stale_p && - (xd->saved_p || - !!(xive_esb_read(xd, XIVE_ESB_GET) & XIVE_ESB_VAL_P)); + pq = xive_esb_read(xd, XIVE_ESB_GET); + + /* + * The esb value being all 1's means we couldn't get + * the PQ state of the interrupt through mmio. It may + * happen, for example when querying a PHB interrupt + * while the PHB is in an error state. We consider the + * interrupt to be inactive in that case. + */ + *state = (pq != XIVE_ESB_INVALID) && !xd->stale_p && + (xd->saved_p || !!(pq & XIVE_ESB_VAL_P)); return 0; default: return -EINVAL;
From: Mehmet Akif Tasova makiftasova@gmail.com
commit 205608749e1ef394f513888091e613c5bfccbcca upstream.
Since v5.4-rc1 was released, iwlwifi started throwing errors when scan commands were sent to the firmware with certain devices (depending on the OTP burned in the device, which contains the list of available channels). For instance:
iwlwifi 0000:00:14.3: FW error in SYNC CMD SCAN_CFG_CMD
This bug was reported in the ArchLinux bug tracker: https://bugs.archlinux.org/task/64703
And also in a specific case in bugzilla, when the lar_disabled option was set: https://bugzilla.kernel.org/show_bug.cgi?id=205193
Revert the commit that introduced this error, by using the number of channels from the OTP instead of the number of channels that is specified in the FW TLV that tells us how many channels it supports.
This reverts commit 06eb547c4ae4382e70d556ba213d13c95ca1801b.
Cc: stable@vger.kernel.org # v5.4+ Signed-off-by: Mehmet Akif Tasova makiftasova@gmail.com [ Luca: reworded the commit message a bit. ] Signed-off-by: Luca Coelho luciano.coelho@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/net/wireless/intel/iwlwifi/mvm/scan.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/net/wireless/intel/iwlwifi/mvm/scan.c +++ b/drivers/net/wireless/intel/iwlwifi/mvm/scan.c @@ -1220,7 +1220,7 @@ static int iwl_mvm_legacy_config_scan(st cmd_size = sizeof(struct iwl_scan_config_v2); else cmd_size = sizeof(struct iwl_scan_config_v1); - cmd_size += num_channels; + cmd_size += mvm->fw->ucode_capa.n_scan_channels;
cfg = kzalloc(cmd_size, GFP_KERNEL); if (!cfg)
From: Emmanuel Grumbach emmanuel.grumbach@intel.com
commit d829229e35f302fd49c052b5c5906c90ecf9911d upstream.
The purpose of this was to keep all the queues updated with the Rx sequence numbers because unlikely yet possible situations where queues can't understand if a specific packet needs to be dropped or not.
Unfortunately, it was reported that this caused issues in our DMA engine. We don't fully understand how this is related, but this is being currently debugged. For now, just don't send this notification to the Rx queues. This de-facto reverts my commit 3c514bf831ac12356b695ff054bef641b9e99593:
iwlwifi: mvm: add a loose synchronization of the NSSN across Rx queues
This issue was reported here: https://bugzilla.kernel.org/show_bug.cgi?id=204873 https://bugzilla.kernel.org/show_bug.cgi?id=205001 and others maybe.
Fixes: 3c514bf831ac ("iwlwifi: mvm: add a loose synchronization of the NSSN across Rx queues") CC: stable@vger.kernel.org # 5.3+ Signed-off-by: Emmanuel Grumbach emmanuel.grumbach@intel.com Signed-off-by: Kalle Valo kvalo@codeaurora.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/net/wireless/intel/iwlwifi/mvm/constants.h | 1 + drivers/net/wireless/intel/iwlwifi/mvm/rxmq.c | 17 ++++++++++------- 2 files changed, 11 insertions(+), 7 deletions(-)
--- a/drivers/net/wireless/intel/iwlwifi/mvm/constants.h +++ b/drivers/net/wireless/intel/iwlwifi/mvm/constants.h @@ -154,5 +154,6 @@ #define IWL_MVM_D3_DEBUG false #define IWL_MVM_USE_TWT false #define IWL_MVM_AMPDU_CONSEC_DROPS_DELBA 10 +#define IWL_MVM_USE_NSSN_SYNC 0
#endif /* __MVM_CONSTANTS_H */ --- a/drivers/net/wireless/intel/iwlwifi/mvm/rxmq.c +++ b/drivers/net/wireless/intel/iwlwifi/mvm/rxmq.c @@ -514,14 +514,17 @@ static bool iwl_mvm_is_sn_less(u16 sn1,
static void iwl_mvm_sync_nssn(struct iwl_mvm *mvm, u8 baid, u16 nssn) { - struct iwl_mvm_rss_sync_notif notif = { - .metadata.type = IWL_MVM_RXQ_NSSN_SYNC, - .metadata.sync = 0, - .nssn_sync.baid = baid, - .nssn_sync.nssn = nssn, - }; + if (IWL_MVM_USE_NSSN_SYNC) { + struct iwl_mvm_rss_sync_notif notif = { + .metadata.type = IWL_MVM_RXQ_NSSN_SYNC, + .metadata.sync = 0, + .nssn_sync.baid = baid, + .nssn_sync.nssn = nssn, + };
- iwl_mvm_sync_rx_queues_internal(mvm, (void *)¬if, sizeof(notif)); + iwl_mvm_sync_rx_queues_internal(mvm, (void *)¬if, + sizeof(notif)); + } }
#define RX_REORDER_BUF_TIMEOUT_MQ (HZ / 10)
From: Matthew Wilcox (Oracle) willy@infradead.org
commit 430f24f94c8a174d411a550d7b5529301922e67a upstream.
If there is an entry at ULONG_MAX, xa_for_each() will overflow the 'index + 1' in xa_find_after() and wrap around to 0. Catch this case and terminate the loop by returning NULL.
Signed-off-by: Matthew Wilcox (Oracle) willy@infradead.org Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- lib/test_xarray.c | 17 +++++++++++++++++ lib/xarray.c | 3 +++ 2 files changed, 20 insertions(+)
--- a/lib/test_xarray.c +++ b/lib/test_xarray.c @@ -1046,11 +1046,28 @@ static noinline void check_find_3(struct xa_destroy(xa); }
+static noinline void check_find_4(struct xarray *xa) +{ + unsigned long index = 0; + void *entry; + + xa_store_index(xa, ULONG_MAX, GFP_KERNEL); + + entry = xa_find_after(xa, &index, ULONG_MAX, XA_PRESENT); + XA_BUG_ON(xa, entry != xa_mk_index(ULONG_MAX)); + + entry = xa_find_after(xa, &index, ULONG_MAX, XA_PRESENT); + XA_BUG_ON(xa, entry); + + xa_erase_index(xa, ULONG_MAX); +} + static noinline void check_find(struct xarray *xa) { check_find_1(xa); check_find_2(xa); check_find_3(xa); + check_find_4(xa); check_multi_find(xa); check_multi_find_2(xa); } --- a/lib/xarray.c +++ b/lib/xarray.c @@ -1847,6 +1847,9 @@ void *xa_find_after(struct xarray *xa, u XA_STATE(xas, xa, *indexp + 1); void *entry;
+ if (xas.xa_index == 0) + return NULL; + rcu_read_lock(); for (;;) { if ((__force unsigned int)filter < XA_MAX_MARKS)
From: Matthew Wilcox (Oracle) willy@infradead.org
commit 19c30f4dd0923ef191f35c652ee4058e91e89056 upstream.
If the entry is of an order which is a multiple of XA_CHUNK_SIZE, the current detection of sibling entries does not work. Factor out an xas_sibling() function to make xa_find_after() a little more understandable, and write a new implementation that doesn't suffer from the same bug.
Signed-off-by: Matthew Wilcox (Oracle) willy@infradead.org Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- lib/test_xarray.c | 32 +++++++++++++++++++------------- lib/xarray.c | 20 +++++++++++++------- 2 files changed, 32 insertions(+), 20 deletions(-)
--- a/lib/test_xarray.c +++ b/lib/test_xarray.c @@ -902,28 +902,30 @@ static noinline void check_store_iter(st XA_BUG_ON(xa, !xa_empty(xa)); }
-static noinline void check_multi_find(struct xarray *xa) +static noinline void check_multi_find_1(struct xarray *xa, unsigned order) { #ifdef CONFIG_XARRAY_MULTI + unsigned long multi = 3 << order; + unsigned long next = 4 << order; unsigned long index;
- xa_store_order(xa, 12, 2, xa_mk_value(12), GFP_KERNEL); - XA_BUG_ON(xa, xa_store_index(xa, 16, GFP_KERNEL) != NULL); + xa_store_order(xa, multi, order, xa_mk_value(multi), GFP_KERNEL); + XA_BUG_ON(xa, xa_store_index(xa, next, GFP_KERNEL) != NULL);
index = 0; XA_BUG_ON(xa, xa_find(xa, &index, ULONG_MAX, XA_PRESENT) != - xa_mk_value(12)); - XA_BUG_ON(xa, index != 12); - index = 13; + xa_mk_value(multi)); + XA_BUG_ON(xa, index != multi); + index = multi + 1; XA_BUG_ON(xa, xa_find(xa, &index, ULONG_MAX, XA_PRESENT) != - xa_mk_value(12)); - XA_BUG_ON(xa, (index < 12) || (index >= 16)); + xa_mk_value(multi)); + XA_BUG_ON(xa, (index < multi) || (index >= next)); XA_BUG_ON(xa, xa_find_after(xa, &index, ULONG_MAX, XA_PRESENT) != - xa_mk_value(16)); - XA_BUG_ON(xa, index != 16); + xa_mk_value(next)); + XA_BUG_ON(xa, index != next);
- xa_erase_index(xa, 12); - xa_erase_index(xa, 16); + xa_erase_index(xa, multi); + xa_erase_index(xa, next); XA_BUG_ON(xa, !xa_empty(xa)); #endif } @@ -1064,11 +1066,15 @@ static noinline void check_find_4(struct
static noinline void check_find(struct xarray *xa) { + unsigned i; + check_find_1(xa); check_find_2(xa); check_find_3(xa); check_find_4(xa); - check_multi_find(xa); + + for (i = 2; i < 10; i++) + check_multi_find_1(xa, i); check_multi_find_2(xa); }
--- a/lib/xarray.c +++ b/lib/xarray.c @@ -1824,6 +1824,17 @@ void *xa_find(struct xarray *xa, unsigne } EXPORT_SYMBOL(xa_find);
+static bool xas_sibling(struct xa_state *xas) +{ + struct xa_node *node = xas->xa_node; + unsigned long mask; + + if (!node) + return false; + mask = (XA_CHUNK_SIZE << node->shift) - 1; + return (xas->xa_index & mask) > (xas->xa_offset << node->shift); +} + /** * xa_find_after() - Search the XArray for a present entry. * @xa: XArray. @@ -1858,13 +1869,8 @@ void *xa_find_after(struct xarray *xa, u entry = xas_find(&xas, max); if (xas.xa_node == XAS_BOUNDS) break; - if (xas.xa_shift) { - if (xas.xa_index & ((1UL << xas.xa_shift) - 1)) - continue; - } else { - if (xas.xa_offset < (xas.xa_index & XA_CHUNK_MASK)) - continue; - } + if (xas_sibling(&xas)) + continue; if (!xas_retry(&xas, entry)) break; }
From: Matthew Wilcox (Oracle) willy@infradead.org
commit c44aa5e8ab58b5f4cf473970ec784c3333496a2e upstream.
If you call xas_find() with the initial index > max, it should have returned NULL but was returning the entry at index.
Signed-off-by: Matthew Wilcox (Oracle) willy@infradead.org Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- lib/test_xarray.c | 5 +++++ lib/xarray.c | 10 ++++++++-- 2 files changed, 13 insertions(+), 2 deletions(-)
--- a/lib/test_xarray.c +++ b/lib/test_xarray.c @@ -2,6 +2,7 @@ /* * test_xarray.c: Test the XArray API * Copyright (c) 2017-2018 Microsoft Corporation + * Copyright (c) 2019-2020 Oracle * Author: Matthew Wilcox willy@infradead.org */
@@ -911,6 +912,7 @@ static noinline void check_multi_find_1(
xa_store_order(xa, multi, order, xa_mk_value(multi), GFP_KERNEL); XA_BUG_ON(xa, xa_store_index(xa, next, GFP_KERNEL) != NULL); + XA_BUG_ON(xa, xa_store_index(xa, next + 1, GFP_KERNEL) != NULL);
index = 0; XA_BUG_ON(xa, xa_find(xa, &index, ULONG_MAX, XA_PRESENT) != @@ -923,9 +925,12 @@ static noinline void check_multi_find_1( XA_BUG_ON(xa, xa_find_after(xa, &index, ULONG_MAX, XA_PRESENT) != xa_mk_value(next)); XA_BUG_ON(xa, index != next); + XA_BUG_ON(xa, xa_find_after(xa, &index, next, XA_PRESENT) != NULL); + XA_BUG_ON(xa, index != next);
xa_erase_index(xa, multi); xa_erase_index(xa, next); + xa_erase_index(xa, next + 1); XA_BUG_ON(xa, !xa_empty(xa)); #endif } --- a/lib/xarray.c +++ b/lib/xarray.c @@ -1,7 +1,8 @@ // SPDX-License-Identifier: GPL-2.0+ /* * XArray implementation - * Copyright (c) 2017 Microsoft Corporation + * Copyright (c) 2017-2018 Microsoft Corporation + * Copyright (c) 2018-2020 Oracle * Author: Matthew Wilcox willy@infradead.org */
@@ -1081,6 +1082,8 @@ void *xas_find(struct xa_state *xas, uns
if (xas_error(xas)) return NULL; + if (xas->xa_index > max) + return set_bounds(xas);
if (!xas->xa_node) { xas->xa_index = 1; @@ -1150,6 +1153,8 @@ void *xas_find_marked(struct xa_state *x
if (xas_error(xas)) return NULL; + if (xas->xa_index > max) + goto max;
if (!xas->xa_node) { xas->xa_index = 1; @@ -1867,7 +1872,8 @@ void *xa_find_after(struct xarray *xa, u entry = xas_find_marked(&xas, max, filter); else entry = xas_find(&xas, max); - if (xas.xa_node == XAS_BOUNDS) + + if (xas_invalid(&xas)) break; if (xas_sibling(&xas)) continue;
From: Boyan Ding boyan.j.ding@gmail.com
commit 9608ea6c6613ced75b2c41703d99f44e6f8849f1 upstream.
Commit 179e5a6114cc ("pinctrl: intel: Remove default Interrupt Status offset") removes default interrupt status offset of GPIO controllers, with previous commits explicitly providing the previously default offsets. However, the is_offset value in SPTH_COMMUNITY is missing, preventing related irq from being properly detected and handled.
Fixes: f702e0b93cdb ("pinctrl: sunrisepoint: Provide Interrupt Status register offset") Link: https://bugzilla.kernel.org/show_bug.cgi?id=205745 Cc: stable@vger.kernel.org Signed-off-by: Boyan Ding boyan.j.ding@gmail.com Acked-by: Mika Westerberg mika.westerberg@linux.intel.com Signed-off-by: Andy Shevchenko andriy.shevchenko@linux.intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/pinctrl/intel/pinctrl-sunrisepoint.c | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/pinctrl/intel/pinctrl-sunrisepoint.c +++ b/drivers/pinctrl/intel/pinctrl-sunrisepoint.c @@ -49,6 +49,7 @@ .padown_offset = SPT_PAD_OWN, \ .padcfglock_offset = SPT_PADCFGLOCK, \ .hostown_offset = SPT_HOSTSW_OWN, \ + .is_offset = SPT_GPI_IS, \ .ie_offset = SPT_GPI_IE, \ .pin_base = (s), \ .npins = ((e) - (s) + 1), \
From: Jerry Snitselaar jsnitsel@redhat.com
commit bf708cfb2f4811d1948a88c41ab96587e84ad344 upstream.
It is possible for archdata.iommu to be set to DEFER_DEVICE_DOMAIN_INFO or DUMMY_DEVICE_DOMAIN_INFO so check for those values before calling __dmar_remove_one_dev_info. Without a check it can result in a null pointer dereference. This has been seen while booting a kdump kernel on an HP dl380 gen9.
Cc: Joerg Roedel joro@8bytes.org Cc: Lu Baolu baolu.lu@linux.intel.com Cc: David Woodhouse dwmw2@infradead.org Cc: stable@vger.kernel.org # 5.3+ Cc: linux-kernel@vger.kernel.org Fixes: ae23bfb68f28 ("iommu/vt-d: Detach domain before using a private one") Signed-off-by: Jerry Snitselaar jsnitsel@redhat.com Acked-by: Lu Baolu baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel jroedel@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/iommu/intel-iommu.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -5132,7 +5132,8 @@ static void dmar_remove_one_dev_info(str
spin_lock_irqsave(&device_domain_lock, flags); info = dev->archdata.iommu; - if (info) + if (info && info != DEFER_DEVICE_DOMAIN_INFO + && info != DUMMY_DEVICE_DOMAIN_INFO) __dmar_remove_one_dev_info(info); spin_unlock_irqrestore(&device_domain_lock, flags); }
From: Johan Hovold johan@kernel.org
commit ba9a103f40fc4a3ec7558ec9b0b97d4f92034249 upstream.
The driver was issuing synchronous uninterruptible control requests without using a timeout. This could lead to the driver hanging on probe due to a malfunctioning (or malicious) device until the device is physically disconnected. While sleeping in probe the driver prevents other devices connected to the same hub from being added to (or removed from) the bus.
The USB upper limit of five seconds per request should be more than enough.
Fixes: 99f83c9c9ac9 ("[PATCH] USB: add driver for Keyspan Digital Remote") Signed-off-by: Johan Hovold johan@kernel.org Reviewed-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: stable stable@vger.kernel.org # 2.6.13 Link: https://lore.kernel.org/r/20200113171715.30621-1-johan@kernel.org Signed-off-by: Dmitry Torokhov dmitry.torokhov@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/input/misc/keyspan_remote.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-)
--- a/drivers/input/misc/keyspan_remote.c +++ b/drivers/input/misc/keyspan_remote.c @@ -336,7 +336,8 @@ static int keyspan_setup(struct usb_devi int retval = 0;
retval = usb_control_msg(dev, usb_sndctrlpipe(dev, 0), - 0x11, 0x40, 0x5601, 0x0, NULL, 0, 0); + 0x11, 0x40, 0x5601, 0x0, NULL, 0, + USB_CTRL_SET_TIMEOUT); if (retval) { dev_dbg(&dev->dev, "%s - failed to set bit rate due to error: %d\n", __func__, retval); @@ -344,7 +345,8 @@ static int keyspan_setup(struct usb_devi }
retval = usb_control_msg(dev, usb_sndctrlpipe(dev, 0), - 0x44, 0x40, 0x0, 0x0, NULL, 0, 0); + 0x44, 0x40, 0x0, 0x0, NULL, 0, + USB_CTRL_SET_TIMEOUT); if (retval) { dev_dbg(&dev->dev, "%s - failed to set resume sensitivity due to error: %d\n", __func__, retval); @@ -352,7 +354,8 @@ static int keyspan_setup(struct usb_devi }
retval = usb_control_msg(dev, usb_sndctrlpipe(dev, 0), - 0x22, 0x40, 0x0, 0x0, NULL, 0, 0); + 0x22, 0x40, 0x0, 0x0, NULL, 0, + USB_CTRL_SET_TIMEOUT); if (retval) { dev_dbg(&dev->dev, "%s - failed to turn receive on due to error: %d\n", __func__, retval);
From: Hans Verkuil hverkuil-cisco@xs4all.nl
commit 8ff771f8c8d55d95f102cf88a970e541a8bd6bcf upstream.
This reverts commit a284e11c371e446371675668d8c8120a27227339.
This causes problems (drifting cursor) with at least the F11 function that reads more than 32 bytes.
The real issue is in the F54 driver, and so this should be fixed there, and not in rmi_smbus.c.
So first revert this bad commit, then fix the real problem in F54 in another patch.
Signed-off-by: Hans Verkuil hverkuil-cisco@xs4all.nl Reported-by: Timo Kaufmann timokau@zoho.com Fixes: a284e11c371e ("Input: synaptics-rmi4 - don't increment rmiaddr for SMBus transfers") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20200115124819.3191024-2-hverkuil-cisco@xs4all.nl Signed-off-by: Dmitry Torokhov dmitry.torokhov@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/input/rmi4/rmi_smbus.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/drivers/input/rmi4/rmi_smbus.c +++ b/drivers/input/rmi4/rmi_smbus.c @@ -163,6 +163,7 @@ static int rmi_smb_write_block(struct rm /* prepare to write next block of bytes */ cur_len -= SMB_MAX_COUNT; databuff += SMB_MAX_COUNT; + rmiaddr += SMB_MAX_COUNT; } exit: mutex_unlock(&rmi_smb->page_mutex); @@ -214,6 +215,7 @@ static int rmi_smb_read_block(struct rmi /* prepare to read next block of bytes */ cur_len -= SMB_MAX_COUNT; databuff += SMB_MAX_COUNT; + rmiaddr += SMB_MAX_COUNT; }
retval = 0;
From: Alex Sverdlin alexander.sverdlin@nokia.com
commit 927d780ee371d7e121cea4fc7812f6ef2cea461c upstream.
Scenario 1, ARMv7 =================
If code in arch/arm/kernel/ftrace.c would operate on mcount() pointer the following may be generated:
00000230 <prealloc_fixed_plts>: 230: b5f8 push {r3, r4, r5, r6, r7, lr} 232: b500 push {lr} 234: f7ff fffe bl 0 <__gnu_mcount_nc> 234: R_ARM_THM_CALL __gnu_mcount_nc 238: f240 0600 movw r6, #0 238: R_ARM_THM_MOVW_ABS_NC __gnu_mcount_nc 23c: f8d0 1180 ldr.w r1, [r0, #384] ; 0x180
FTRACE currently is not able to deal with it:
WARNING: CPU: 0 PID: 0 at .../kernel/trace/ftrace.c:1979 ftrace_bug+0x1ad/0x230() ... CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.4.116-... #1 ... [<c0314e3d>] (unwind_backtrace) from [<c03115e9>] (show_stack+0x11/0x14) [<c03115e9>] (show_stack) from [<c051a7f1>] (dump_stack+0x81/0xa8) [<c051a7f1>] (dump_stack) from [<c0321c5d>] (warn_slowpath_common+0x69/0x90) [<c0321c5d>] (warn_slowpath_common) from [<c0321cf3>] (warn_slowpath_null+0x17/0x1c) [<c0321cf3>] (warn_slowpath_null) from [<c038ee9d>] (ftrace_bug+0x1ad/0x230) [<c038ee9d>] (ftrace_bug) from [<c038f1f9>] (ftrace_process_locs+0x27d/0x444) [<c038f1f9>] (ftrace_process_locs) from [<c08915bd>] (ftrace_init+0x91/0xe8) [<c08915bd>] (ftrace_init) from [<c0885a67>] (start_kernel+0x34b/0x358) [<c0885a67>] (start_kernel) from [<00308095>] (0x308095) ---[ end trace cb88537fdc8fa200 ]--- ftrace failed to modify [<c031266c>] prealloc_fixed_plts+0x8/0x60 actual: 44:f2:e1:36 ftrace record flags: 0 (0) expected tramp: c03143e9
Scenario 2, ARMv4T ==================
ftrace: allocating 14435 entries in 43 pages ------------[ cut here ]------------ WARNING: CPU: 0 PID: 0 at kernel/trace/ftrace.c:2029 ftrace_bug+0x204/0x310 CPU: 0 PID: 0 Comm: swapper Not tainted 4.19.5 #1 Hardware name: Cirrus Logic EDB9302 Evaluation Board [<c0010a24>] (unwind_backtrace) from [<c000ecb0>] (show_stack+0x20/0x2c) [<c000ecb0>] (show_stack) from [<c03c72e8>] (dump_stack+0x20/0x30) [<c03c72e8>] (dump_stack) from [<c0021c18>] (__warn+0xdc/0x104) [<c0021c18>] (__warn) from [<c0021d7c>] (warn_slowpath_null+0x4c/0x5c) [<c0021d7c>] (warn_slowpath_null) from [<c0095360>] (ftrace_bug+0x204/0x310) [<c0095360>] (ftrace_bug) from [<c04dabac>] (ftrace_init+0x3b4/0x4d4) [<c04dabac>] (ftrace_init) from [<c04cef4c>] (start_kernel+0x20c/0x410) [<c04cef4c>] (start_kernel) from [<00000000>] ( (null)) ---[ end trace 0506a2f5dae6b341 ]--- ftrace failed to modify [<c000c350>] perf_trace_sys_exit+0x5c/0xe8 actual: 1e:ff:2f:e1 Initializing ftrace call sites ftrace record flags: 0 (0) expected tramp: c000fb24
The analysis for this problem has been already performed previously, refer to the link below.
Fix the above problems by allowing only selected reloc types in __mcount_loc. The list itself comes from the legacy recordmcount.pl script.
Link: https://lore.kernel.org/lkml/56961010.6000806@pengutronix.de/ Cc: stable@vger.kernel.org Fixes: ed60453fa8f8 ("ARM: 6511/1: ftrace: add ARM support for C version of recordmcount") Signed-off-by: Alexander Sverdlin alexander.sverdlin@nokia.com Acked-by: Steven Rostedt (VMware) rostedt@goodmis.org Signed-off-by: Russell King rmk+kernel@armlinux.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- scripts/recordmcount.c | 17 +++++++++++++++++ 1 file changed, 17 insertions(+)
--- a/scripts/recordmcount.c +++ b/scripts/recordmcount.c @@ -38,6 +38,10 @@ #define R_AARCH64_ABS64 257 #endif
+#define R_ARM_PC24 1 +#define R_ARM_THM_CALL 10 +#define R_ARM_CALL 28 + static int fd_map; /* File descriptor for file being modified. */ static int mmap_failed; /* Boolean flag. */ static char gpfx; /* prefix for global symbol name (sometimes '_') */ @@ -418,6 +422,18 @@ static char const *already_has_rel_mcoun #define RECORD_MCOUNT_64 #include "recordmcount.h"
+static int arm_is_fake_mcount(Elf32_Rel const *rp) +{ + switch (ELF32_R_TYPE(w(rp->r_info))) { + case R_ARM_THM_CALL: + case R_ARM_CALL: + case R_ARM_PC24: + return 0; + } + + return 1; +} + /* 64-bit EM_MIPS has weird ELF64_Rela.r_info. * http://techpubs.sgi.com/library/manuals/4000/007-4658-001/pdf/007-4658-001.p... * We interpret Table 29 Relocation Operation (Elf64_Rel, Elf64_Rela) [p.40] @@ -523,6 +539,7 @@ static int do_file(char const *const fna altmcount = "__gnu_mcount_nc"; make_nop = make_nop_arm; rel_type_nop = R_ARM_NONE; + is_fake_mcount32 = arm_is_fake_mcount; gpfx = 0; break; case EM_AARCH64:
From: Michał Mirosław mirq-linux@rere.qmqm.pl
commit f571389c0b015e76f91c697c4c1700aba860d34f upstream.
Commit 7ad2ed1dfcbe inadvertently mixed up a quirk flag's name and broke SDR50 tuning override. Use correct NVQUIRK_ name.
Fixes: 7ad2ed1dfcbe ("mmc: tegra: enable UHS-I modes") Cc: stable@vger.kernel.org Acked-by: Adrian Hunter adrian.hunter@intel.com Reviewed-by: Thierry Reding treding@nvidia.com Tested-by: Thierry Reding treding@nvidia.com Signed-off-by: Michał Mirosław mirq-linux@rere.qmqm.pl Link: https://lore.kernel.org/r/9aff1d859935e59edd81e4939e40d6c55e0b55f6.157839038... Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/mmc/host/sdhci-tegra.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/mmc/host/sdhci-tegra.c +++ b/drivers/mmc/host/sdhci-tegra.c @@ -386,7 +386,7 @@ static void tegra_sdhci_reset(struct sdh misc_ctrl |= SDHCI_MISC_CTRL_ENABLE_DDR50; if (soc_data->nvquirks & NVQUIRK_ENABLE_SDR104) misc_ctrl |= SDHCI_MISC_CTRL_ENABLE_SDR104; - if (soc_data->nvquirks & SDHCI_MISC_CTRL_ENABLE_SDR50) + if (soc_data->nvquirks & NVQUIRK_ENABLE_SDR50) clk_ctrl |= SDHCI_CLOCK_CTRL_SDR50_TUNING_OVERRIDE; }
From: Michał Mirosław mirq-linux@rere.qmqm.pl
commit 2a187d03352086e300daa2044051db00044cd171 upstream.
For SDHCIv3+ with programmable clock mode, minimal clock frequency is still base clock / max(divider). Minimal programmable clock frequency is always greater than minimal divided clock frequency. Without this patch, SDHCI uses out-of-spec initial frequency when multiplier is big enough:
mmc1: mmc_rescan_try_freq: trying to init card at 468750 Hz [for 480 MHz source clock divided by 1024]
The code in sdhci_calc_clk() already chooses a correct SDCLK clock mode.
Fixes: c3ed3877625f ("mmc: sdhci: add support for programmable clock mode") Cc: stable@vger.kernel.org # 4f6aa3264af4: mmc: tegra: Only advertise UHS modes if IO regulator is present Cc: stable@vger.kernel.org Signed-off-by: Michał Mirosław mirq-linux@rere.qmqm.pl Acked-by: Adrian Hunter adrian.hunter@intel.com Link: https://lore.kernel.org/r/ffb489519a446caffe7a0a05c4b9372bd52397bb.157908203... Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/mmc/host/sdhci.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-)
--- a/drivers/mmc/host/sdhci.c +++ b/drivers/mmc/host/sdhci.c @@ -3902,11 +3902,13 @@ int sdhci_setup_host(struct sdhci_host * if (host->ops->get_min_clock) mmc->f_min = host->ops->get_min_clock(host); else if (host->version >= SDHCI_SPEC_300) { - if (host->clk_mul) { - mmc->f_min = (host->max_clk * host->clk_mul) / 1024; + if (host->clk_mul) max_clk = host->max_clk * host->clk_mul; - } else - mmc->f_min = host->max_clk / SDHCI_MAX_DIV_SPEC_300; + /* + * Divided Clock Mode minimum clock rate is always less than + * Programmable Clock Mode minimum clock rate. + */ + mmc->f_min = host->max_clk / SDHCI_MAX_DIV_SPEC_300; } else mmc->f_min = host->max_clk / SDHCI_MAX_DIV_SPEC_200;
From: Faiz Abbas faiz_abbas@ti.com
commit 4d627c88546a697b07565dbb70d2f9f46a5ee76f upstream.
The MMC/SD controllers on am65x and j721e don't in fact detect the write protect line as inverted. No issues were detected because of this because the sdwp line is not connected on any of the evms. Fix this by removing the flag.
Fixes: 1accbced1c32 ("mmc: sdhci_am654: Add Support for 4 bit IP on J721E") Cc: stable@vger.kernel.org Signed-off-by: Faiz Abbas faiz_abbas@ti.com Acked-by: Adrian Hunter adrian.hunter@intel.com Link: https://lore.kernel.org/r/20200108143301.1929-2-faiz_abbas@ti.com Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/mmc/host/sdhci_am654.c | 9 +++------ 1 file changed, 3 insertions(+), 6 deletions(-)
--- a/drivers/mmc/host/sdhci_am654.c +++ b/drivers/mmc/host/sdhci_am654.c @@ -249,8 +249,7 @@ static struct sdhci_ops sdhci_am654_ops
static const struct sdhci_pltfm_data sdhci_am654_pdata = { .ops = &sdhci_am654_ops, - .quirks = SDHCI_QUIRK_INVERTED_WRITE_PROTECT | - SDHCI_QUIRK_MULTIBLOCK_READ_ACMD12, + .quirks = SDHCI_QUIRK_MULTIBLOCK_READ_ACMD12, .quirks2 = SDHCI_QUIRK2_PRESET_VALUE_BROKEN, };
@@ -272,8 +271,7 @@ static struct sdhci_ops sdhci_j721e_8bit
static const struct sdhci_pltfm_data sdhci_j721e_8bit_pdata = { .ops = &sdhci_j721e_8bit_ops, - .quirks = SDHCI_QUIRK_INVERTED_WRITE_PROTECT | - SDHCI_QUIRK_MULTIBLOCK_READ_ACMD12, + .quirks = SDHCI_QUIRK_MULTIBLOCK_READ_ACMD12, .quirks2 = SDHCI_QUIRK2_PRESET_VALUE_BROKEN, };
@@ -295,8 +293,7 @@ static struct sdhci_ops sdhci_j721e_4bit
static const struct sdhci_pltfm_data sdhci_j721e_4bit_pdata = { .ops = &sdhci_j721e_4bit_ops, - .quirks = SDHCI_QUIRK_INVERTED_WRITE_PROTECT | - SDHCI_QUIRK_MULTIBLOCK_READ_ACMD12, + .quirks = SDHCI_QUIRK_MULTIBLOCK_READ_ACMD12, .quirks2 = SDHCI_QUIRK2_PRESET_VALUE_BROKEN, };
From: Faiz Abbas faiz_abbas@ti.com
commit de31f6ab68a3f548d88686d53514f252d78f61d5 upstream.
The tuning data is leftover in the buffer after tuning. This can cause issues in future data commands, especially with CQHCI. Reset the command and data lines after tuning to continue from a clean state.
Fixes: 41fd4caeb00b ("mmc: sdhci_am654: Add Initial Support for AM654 SDHCI driver") Cc: stable@vger.kernel.org Signed-off-by: Faiz Abbas faiz_abbas@ti.com Acked-by: Adrian Hunter adrian.hunter@intel.com Link: https://lore.kernel.org/r/20200108143301.1929-3-faiz_abbas@ti.com Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/mmc/host/sdhci_am654.c | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+)
--- a/drivers/mmc/host/sdhci_am654.c +++ b/drivers/mmc/host/sdhci_am654.c @@ -236,6 +236,22 @@ static void sdhci_am654_write_b(struct s writeb(val, host->ioaddr + reg); }
+static int sdhci_am654_execute_tuning(struct mmc_host *mmc, u32 opcode) +{ + struct sdhci_host *host = mmc_priv(mmc); + int err = sdhci_execute_tuning(mmc, opcode); + + if (err) + return err; + /* + * Tuning data remains in the buffer after tuning. + * Do a command and data reset to get rid of it + */ + sdhci_reset(host, SDHCI_RESET_CMD | SDHCI_RESET_DATA); + + return 0; +} + static struct sdhci_ops sdhci_am654_ops = { .get_max_clock = sdhci_pltfm_clk_get_max_clock, .get_timeout_clock = sdhci_pltfm_clk_get_max_clock, @@ -477,6 +493,8 @@ static int sdhci_am654_probe(struct plat goto pm_runtime_put; }
+ host->mmc_host_ops.execute_tuning = sdhci_am654_execute_tuning; + ret = sdhci_am654_init(host); if (ret) goto pm_runtime_put;
From: Ido Schimmel idosch@mellanox.com
commit 63963d0f9d17be83d0e419e03282847ecc2c3715 upstream.
The driver needs to prepend a Tx header to each packet it is transmitting. The header includes information such as the egress port and traffic class.
The addition of the header requires the driver to modify the SKB's header and therefore it must not be shared. Otherwise, we risk hitting various race conditions.
For example, when a packet is flooded (cloned) by the bridge driver to two switch ports swp1 and swp2:
t0 - mlxsw_sp_port_xmit() is called for swp1. Tx header is prepended with swp1's port number t1 - mlxsw_sp_port_xmit() is called for swp2. Tx header is prepended with swp2's port number, overwriting swp1's port number t2 - The device processes data buffer from t0. Packet is transmitted via swp2 t3 - The device processes data buffer from t1. Packet is transmitted via swp2
Usually, the device is fast enough and transmits the packet before its Tx header is overwritten, but this is not the case in emulated environments.
Fix this by making sure the SKB's header is writable by calling skb_cow_head(). Since the function ensures we have headroom to push the Tx header, the check further in the function can be removed.
v2: * Use skb_cow_head() instead of skb_unshare() as suggested by Jakub * Remove unnecessary check regarding headroom
Fixes: 31557f0f9755 ("mlxsw: Introduce Mellanox SwitchX-2 ASIC support") Signed-off-by: Ido Schimmel idosch@mellanox.com Reported-by: Shalom Toledo shalomt@mellanox.com Acked-by: Jiri Pirko jiri@mellanox.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/net/ethernet/mellanox/mlxsw/switchx2.c | 17 ++++++----------- 1 file changed, 6 insertions(+), 11 deletions(-)
--- a/drivers/net/ethernet/mellanox/mlxsw/switchx2.c +++ b/drivers/net/ethernet/mellanox/mlxsw/switchx2.c @@ -299,22 +299,17 @@ static netdev_tx_t mlxsw_sx_port_xmit(st u64 len; int err;
+ if (skb_cow_head(skb, MLXSW_TXHDR_LEN)) { + this_cpu_inc(mlxsw_sx_port->pcpu_stats->tx_dropped); + dev_kfree_skb_any(skb); + return NETDEV_TX_OK; + } + memset(skb->cb, 0, sizeof(struct mlxsw_skb_cb));
if (mlxsw_core_skb_transmit_busy(mlxsw_sx->core, &tx_info)) return NETDEV_TX_BUSY;
- if (unlikely(skb_headroom(skb) < MLXSW_TXHDR_LEN)) { - struct sk_buff *skb_orig = skb; - - skb = skb_realloc_headroom(skb, MLXSW_TXHDR_LEN); - if (!skb) { - this_cpu_inc(mlxsw_sx_port->pcpu_stats->tx_dropped); - dev_kfree_skb_any(skb_orig); - return NETDEV_TX_OK; - } - dev_consume_skb_any(skb_orig); - } mlxsw_sx_txhdr_construct(skb, &tx_info); /* TX header is consumed by HW on the way so we shouldn't count its * bytes as being sent.
From: Jakub Kicinski jakub.kicinski@netronome.com
commit db885e66d268884dc72967279b7e84f522556abc upstream.
Mallesham reports the TLS with async accelerator was broken by commit d10523d0b3d7 ("net/tls: free the record on encryption error") because encryption can return -EINPROGRESS in such setups, which should not be treated as an error.
The error is also present in the BPF path (likely copied from there).
Reported-by: Mallesham Jatharakonda mallesham.jatharakonda@oneconvergence.com Fixes: d3b18ad31f93 ("tls: add bpf support to sk_msg handling") Fixes: d10523d0b3d7 ("net/tls: free the record on encryption error") Signed-off-by: Jakub Kicinski jakub.kicinski@netronome.com Reviewed-by: Simon Horman simon.horman@netronome.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/tls/tls_sw.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/net/tls/tls_sw.c +++ b/net/tls/tls_sw.c @@ -793,7 +793,7 @@ static int bpf_exec_tx_verdict(struct sk psock = sk_psock_get(sk); if (!psock || !policy) { err = tls_push_record(sk, flags, record_type); - if (err) { + if (err && err != -EINPROGRESS) { *copied -= sk_msg_free(sk, msg); tls_free_open_rec(sk); } @@ -819,7 +819,7 @@ more_data: switch (psock->eval) { case __SK_PASS: err = tls_push_record(sk, flags, record_type); - if (err < 0) { + if (err && err != -EINPROGRESS) { *copied -= sk_msg_free(sk, msg); tls_free_open_rec(sk); goto out_err;
From: Stephan Gerhold stephan@gerhold.net
commit 996d5d5f89a558a3608a46e73ccd1b99f1b1d058 upstream.
Setting the vibrator enable_mask is not implemented correctly:
For regmap_update_bits(map, reg, mask, val) we give in either regs->enable_mask or 0 (= no-op) as mask and "val" as value. But "val" actually refers to the vibrator voltage control register, which has nothing to do with the enable_mask.
So we usually end up doing nothing when we really wanted to enable the vibrator.
We want to set or clear the enable_mask (to enable/disable the vibrator). Therefore, change the call to always modify the enable_mask and set the bits only if we want to enable the vibrator.
Fixes: d4c7c5c96c92 ("Input: pm8xxx-vib - handle separate enable register") Signed-off-by: Stephan Gerhold stephan@gerhold.net Link: https://lore.kernel.org/r/20200114183442.45720-1-stephan@gerhold.net Signed-off-by: Dmitry Torokhov dmitry.torokhov@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/input/misc/pm8xxx-vibrator.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/input/misc/pm8xxx-vibrator.c +++ b/drivers/input/misc/pm8xxx-vibrator.c @@ -90,7 +90,7 @@ static int pm8xxx_vib_set(struct pm8xxx_
if (regs->enable_mask) rc = regmap_update_bits(vib->regmap, regs->enable_addr, - on ? regs->enable_mask : 0, val); + regs->enable_mask, on ? ~0 : 0);
return rc; }
From: Johan Hovold johan@kernel.org
commit 6b32391ed675827f8425a414abbc6fbd54ea54fe upstream.
Make sure to use the current alternate setting when verifying the interface descriptors to avoid binding to an invalid interface.
This in turn could cause the driver to misbehave or trigger a WARN() in usb_submit_urb() that kernels with panic_on_warn set would choke on.
Fixes: bdb5c57f209c ("Input: add sur40 driver for Samsung SUR40 (aka MS Surface 2.0/Pixelsense)") Signed-off-by: Johan Hovold johan@kernel.org Acked-by: Vladis Dronov vdronov@redhat.com Link: https://lore.kernel.org/r/20191210113737.4016-8-johan@kernel.org Signed-off-by: Dmitry Torokhov dmitry.torokhov@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/input/touchscreen/sur40.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/input/touchscreen/sur40.c +++ b/drivers/input/touchscreen/sur40.c @@ -653,7 +653,7 @@ static int sur40_probe(struct usb_interf int error;
/* Check if we really have the right interface. */ - iface_desc = &interface->altsetting[0]; + iface_desc = interface->cur_altsetting; if (iface_desc->desc.bInterfaceClass != 0xFF) return -ENODEV;
From: Johan Hovold johan@kernel.org
commit a8eeb74df5a6bdb214b2b581b14782c5f5a0cf83 upstream.
The driver was checking the number of endpoints of the first alternate setting instead of the current one, something which could lead to the driver binding to an invalid interface.
This in turn could cause the driver to misbehave or trigger a WARN() in usb_submit_urb() that kernels with panic_on_warn set would choke on.
Fixes: 162f98dea487 ("Input: gtco - fix crash on detecting device without endpoints") Signed-off-by: Johan Hovold johan@kernel.org Acked-by: Vladis Dronov vdronov@redhat.com Link: https://lore.kernel.org/r/20191210113737.4016-5-johan@kernel.org Signed-off-by: Dmitry Torokhov dmitry.torokhov@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/input/tablet/gtco.c | 10 +++------- 1 file changed, 3 insertions(+), 7 deletions(-)
--- a/drivers/input/tablet/gtco.c +++ b/drivers/input/tablet/gtco.c @@ -875,18 +875,14 @@ static int gtco_probe(struct usb_interfa }
/* Sanity check that a device has an endpoint */ - if (usbinterface->altsetting[0].desc.bNumEndpoints < 1) { + if (usbinterface->cur_altsetting->desc.bNumEndpoints < 1) { dev_err(&usbinterface->dev, "Invalid number of endpoints\n"); error = -EINVAL; goto err_free_urb; }
- /* - * The endpoint is always altsetting 0, we know this since we know - * this device only has one interrupt endpoint - */ - endpoint = &usbinterface->altsetting[0].endpoint[0].desc; + endpoint = &usbinterface->cur_altsetting->endpoint[0].desc;
/* Some debug */ dev_dbg(&usbinterface->dev, "gtco # interfaces: %d\n", usbinterface->num_altsetting); @@ -973,7 +969,7 @@ static int gtco_probe(struct usb_interfa input_dev->dev.parent = &usbinterface->dev;
/* Setup the URB, it will be posted later on open of input device */ - endpoint = &usbinterface->altsetting[0].endpoint[0].desc; + endpoint = &usbinterface->cur_altsetting->endpoint[0].desc;
usb_fill_int_urb(gtco->urbinfo, udev,
From: Johan Hovold johan@kernel.org
commit 3111491fca4f01764e0c158c5e0f7ced808eef51 upstream.
The driver was checking the number of endpoints of the first alternate setting instead of the current one, something which could lead to the driver binding to an invalid interface.
This in turn could cause the driver to misbehave or trigger a WARN() in usb_submit_urb() that kernels with panic_on_warn set would choke on.
Fixes: 8e20cf2bce12 ("Input: aiptek - fix crash on detecting device without endpoints") Signed-off-by: Johan Hovold johan@kernel.org Acked-by: Vladis Dronov vdronov@redhat.com Link: https://lore.kernel.org/r/20191210113737.4016-3-johan@kernel.org Signed-off-by: Dmitry Torokhov dmitry.torokhov@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/input/tablet/aiptek.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
--- a/drivers/input/tablet/aiptek.c +++ b/drivers/input/tablet/aiptek.c @@ -1802,14 +1802,14 @@ aiptek_probe(struct usb_interface *intf, input_set_abs_params(inputdev, ABS_WHEEL, AIPTEK_WHEEL_MIN, AIPTEK_WHEEL_MAX - 1, 0, 0);
/* Verify that a device really has an endpoint */ - if (intf->altsetting[0].desc.bNumEndpoints < 1) { + if (intf->cur_altsetting->desc.bNumEndpoints < 1) { dev_err(&intf->dev, "interface has %d endpoints, but must have minimum 1\n", - intf->altsetting[0].desc.bNumEndpoints); + intf->cur_altsetting->desc.bNumEndpoints); err = -EINVAL; goto fail3; } - endpoint = &intf->altsetting[0].endpoint[0].desc; + endpoint = &intf->cur_altsetting->endpoint[0].desc;
/* Go set up our URB, which is called when the tablet receives * input.
From: Johan Hovold johan@kernel.org
commit bcfcb7f9b480dd0be8f0df2df17340ca92a03b98 upstream.
The driver was checking the number of endpoints of the first alternate setting instead of the current one, something which could be used by a malicious device (or USB descriptor fuzzer) to trigger a NULL-pointer dereference.
Fixes: 1afca2b66aac ("Input: add Pegasus Notetaker tablet driver") Signed-off-by: Johan Hovold johan@kernel.org Acked-by: Martin Kepplinger martink@posteo.de Acked-by: Vladis Dronov vdronov@redhat.com Link: https://lore.kernel.org/r/20191210113737.4016-2-johan@kernel.org Signed-off-by: Dmitry Torokhov dmitry.torokhov@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/input/tablet/pegasus_notetaker.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/input/tablet/pegasus_notetaker.c +++ b/drivers/input/tablet/pegasus_notetaker.c @@ -275,7 +275,7 @@ static int pegasus_probe(struct usb_inte return -ENODEV;
/* Sanity check that the device has an endpoint */ - if (intf->altsetting[0].desc.bNumEndpoints < 1) { + if (intf->cur_altsetting->desc.bNumEndpoints < 1) { dev_err(&intf->dev, "Invalid number of endpoints\n"); return -EINVAL; }
From: Chuhong Yuan hslester96@gmail.com
commit 97e24b095348a15ec08c476423c3b3b939186ad7 upstream.
The driver misses a check for devm_thermal_zone_of_sensor_register(). Add a check to fix it.
Fixes: e28d0c9cd381 ("input: convert sun4i-ts to use devm_thermal_zone_of_sensor_register") Signed-off-by: Chuhong Yuan hslester96@gmail.com Signed-off-by: Dmitry Torokhov dmitry.torokhov@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/input/touchscreen/sun4i-ts.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
--- a/drivers/input/touchscreen/sun4i-ts.c +++ b/drivers/input/touchscreen/sun4i-ts.c @@ -237,6 +237,7 @@ static int sun4i_ts_probe(struct platfor struct device *dev = &pdev->dev; struct device_node *np = dev->of_node; struct device *hwmon; + struct thermal_zone_device *thermal; int error; u32 reg; bool ts_attached; @@ -355,7 +356,10 @@ static int sun4i_ts_probe(struct platfor if (IS_ERR(hwmon)) return PTR_ERR(hwmon);
- devm_thermal_zone_of_sensor_register(ts->dev, 0, ts, &sun4i_ts_tz_ops); + thermal = devm_thermal_zone_of_sensor_register(ts->dev, 0, ts, + &sun4i_ts_tz_ops); + if (IS_ERR(thermal)) + return PTR_ERR(thermal);
writel(TEMP_IRQ_EN(1), ts->base + TP_INT_FIFOC);
From: Florian Westphal fw@strlen.de
commit 7eaecf7963c1c8f62d62c6a8e7c439b0e7f2d365 upstream.
syzbot reports just another NULL deref crash because of missing test for presence of the attribute.
Reported-by: syzbot+cf23983d697c26c34f60@syzkaller.appspotmail.com Fixes: b96af92d6eaf9fadd ("netfilter: nf_tables: implement Passive OS fingerprint module in nft_osf") Signed-off-by: Florian Westphal fw@strlen.de Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/netfilter/nft_osf.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/net/netfilter/nft_osf.c +++ b/net/netfilter/nft_osf.c @@ -61,6 +61,9 @@ static int nft_osf_init(const struct nft int err; u8 ttl;
+ if (!tb[NFTA_OSF_DREG]) + return -EINVAL; + if (tb[NFTA_OSF_TTL]) { ttl = nla_get_u8(tb[NFTA_OSF_TTL]); if (ttl > 2)
From: Christophe Leroy christophe.leroy@c-s.fr
commit ab10ae1c3bef56c29bac61e1201c752221b87b41 upstream.
The range passed to user_access_begin() by strncpy_from_user() and strnlen_user() starts at 'src' and goes up to the limit of userspace although reads will be limited by the 'count' param.
On 32 bits powerpc (book3s/32) access has to be granted for each 256Mbytes segment and the cost increases with the number of segments to unlock.
Limit the range with 'count' param.
Fixes: 594cc251fdd0 ("make 'user_access_begin()' do 'access_ok()'") Signed-off-by: Christophe Leroy christophe.leroy@c-s.fr Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- lib/strncpy_from_user.c | 14 +++++++------- lib/strnlen_user.c | 14 +++++++------- 2 files changed, 14 insertions(+), 14 deletions(-)
--- a/lib/strncpy_from_user.c +++ b/lib/strncpy_from_user.c @@ -30,13 +30,6 @@ static inline long do_strncpy_from_user( const struct word_at_a_time constants = WORD_AT_A_TIME_CONSTANTS; unsigned long res = 0;
- /* - * Truncate 'max' to the user-specified limit, so that - * we only have one limit we need to check in the loop - */ - if (max > count) - max = count; - if (IS_UNALIGNED(src, dst)) goto byte_at_a_time;
@@ -114,6 +107,13 @@ long strncpy_from_user(char *dst, const unsigned long max = max_addr - src_addr; long retval;
+ /* + * Truncate 'max' to the user-specified limit, so that + * we only have one limit we need to check in the loop + */ + if (max > count) + max = count; + kasan_check_write(dst, count); check_object_size(dst, count, false); if (user_access_begin(src, max)) { --- a/lib/strnlen_user.c +++ b/lib/strnlen_user.c @@ -27,13 +27,6 @@ static inline long do_strnlen_user(const unsigned long c;
/* - * Truncate 'max' to the user-specified limit, so that - * we only have one limit we need to check in the loop - */ - if (max > count) - max = count; - - /* * Do everything aligned. But that means that we * need to also expand the maximum.. */ @@ -109,6 +102,13 @@ long strnlen_user(const char __user *str unsigned long max = max_addr - src_addr; long retval;
+ /* + * Truncate 'max' to the user-specified limit, so that + * we only have one limit we need to check in the loop + */ + if (max > count) + max = count; + if (user_access_begin(str, max)) { retval = do_strnlen_user(str, count, max); user_access_end();
From: Shuah Khan skhan@linuxfoundation.org
commit 8c17bbf6c8f70058a66305f2e1982552e6ea7f47 upstream.
init_iommu_perf_ctr() clobbers the register when it checks write access to IOMMU perf counters and fails to restore when they are writable.
Add save and restore to fix it.
Signed-off-by: Shuah Khan skhan@linuxfoundation.org Fixes: 30861ddc9cca4 ("perf/x86/amd: Add IOMMU Performance Counter resource management") Reviewed-by: Suravee Suthikulpanit suravee.suthikulpanit@amd.com Tested-by: Suravee Suthikulpanit suravee.suthikulpanit@amd.com Signed-off-by: Joerg Roedel jroedel@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/iommu/amd_iommu_init.c | 24 ++++++++++++++++++------ 1 file changed, 18 insertions(+), 6 deletions(-)
--- a/drivers/iommu/amd_iommu_init.c +++ b/drivers/iommu/amd_iommu_init.c @@ -1655,27 +1655,39 @@ static int iommu_pc_get_set_reg(struct a static void init_iommu_perf_ctr(struct amd_iommu *iommu) { struct pci_dev *pdev = iommu->dev; - u64 val = 0xabcd, val2 = 0; + u64 val = 0xabcd, val2 = 0, save_reg = 0;
if (!iommu_feature(iommu, FEATURE_PC)) return;
amd_iommu_pc_present = true;
+ /* save the value to restore, if writable */ + if (iommu_pc_get_set_reg(iommu, 0, 0, 0, &save_reg, false)) + goto pc_false; + /* Check if the performance counters can be written to */ if ((iommu_pc_get_set_reg(iommu, 0, 0, 0, &val, true)) || (iommu_pc_get_set_reg(iommu, 0, 0, 0, &val2, false)) || - (val != val2)) { - pci_err(pdev, "Unable to write to IOMMU perf counter.\n"); - amd_iommu_pc_present = false; - return; - } + (val != val2)) + goto pc_false; + + /* restore */ + if (iommu_pc_get_set_reg(iommu, 0, 0, 0, &save_reg, true)) + goto pc_false;
pci_info(pdev, "IOMMU performance counters supported\n");
val = readl(iommu->mmio_base + MMIO_CNTR_CONF_OFFSET); iommu->max_banks = (u8) ((val >> 12) & 0x3f); iommu->max_counters = (u8) ((val >> 7) & 0xf); + + return; + +pc_false: + pci_err(pdev, "Unable to read/write to IOMMU perf counter.\n"); + amd_iommu_pc_present = false; + return; }
static ssize_t amd_iommu_show_cap(struct device *dev,
From: Linus Torvalds torvalds@linux-foundation.org
commit 3c2659bd1db81ed6a264a9fc6262d51667d655ad upstream.
In commit 9f79b78ef744 ("Convert filldir[64]() from __put_user() to unsafe_put_user()") I changed filldir to not do individual __put_user() accesses, but instead use unsafe_put_user() surrounded by the proper user_access_begin/end() pair.
That make them enormously faster on modern x86, where the STAC/CLAC games make individual user accesses fairly heavy-weight.
However, the user_access_begin() range was not really the exact right one, since filldir() has the unfortunate problem that it needs to not only fill out the new directory entry, it also needs to fix up the previous one to contain the proper file offset.
It's unfortunate, but the "d_off" field in "struct dirent" is _not_ the file offset of the directory entry itself - it's the offset of the next one. So we end up backfilling the offset in the previous entry as we walk along.
But since x86 didn't really care about the exact range, and used to be the only architecture that did anything fancy in user_access_begin() to begin with, the filldir[64]() changes did something lazy, and even commented on it:
/* * Note! This range-checks 'previous' (which may be NULL). * The real range was checked in getdents */ if (!user_access_begin(dirent, sizeof(*dirent))) goto efault;
and it all worked fine.
But now 32-bit ppc is starting to also implement user_access_begin(), and the fact that we faked the range to only be the (possibly not even valid) previous directory entry becomes a problem, because ppc32 will actually be using the range that is passed in for more than just "check that it's user space".
This is a complete rewrite of Christophe's original patch.
By saving off the record length of the previous entry instead of a pointer to it in the filldir data structures, we can simplify the range check and the writing of the previous entry d_off field. No need for any conditionals in the user accesses themselves, although we retain the conditional EINTR checking for the "was this the first directory entry" signal handling latency logic.
Fixes: 9f79b78ef744 ("Convert filldir[64]() from __put_user() to unsafe_put_user()") Link: https://lore.kernel.org/lkml/a02d3426f93f7eb04960a4d9140902d278cab0bb.157969... Link: https://lore.kernel.org/lkml/408c90c4068b00ea8f1c41cca45b84ec23d4946b.157978... Reported-and-tested-by: Christophe Leroy christophe.leroy@c-s.fr Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/readdir.c | 73 ++++++++++++++++++++++++++++------------------------------- 1 file changed, 35 insertions(+), 38 deletions(-)
--- a/fs/readdir.c +++ b/fs/readdir.c @@ -206,7 +206,7 @@ struct linux_dirent { struct getdents_callback { struct dir_context ctx; struct linux_dirent __user * current_dir; - struct linux_dirent __user * previous; + int prev_reclen; int count; int error; }; @@ -214,12 +214,13 @@ struct getdents_callback { static int filldir(struct dir_context *ctx, const char *name, int namlen, loff_t offset, u64 ino, unsigned int d_type) { - struct linux_dirent __user * dirent; + struct linux_dirent __user *dirent, *prev; struct getdents_callback *buf = container_of(ctx, struct getdents_callback, ctx); unsigned long d_ino; int reclen = ALIGN(offsetof(struct linux_dirent, d_name) + namlen + 2, sizeof(long)); + int prev_reclen;
buf->error = verify_dirent_name(name, namlen); if (unlikely(buf->error)) @@ -232,28 +233,24 @@ static int filldir(struct dir_context *c buf->error = -EOVERFLOW; return -EOVERFLOW; } - dirent = buf->previous; - if (dirent && signal_pending(current)) + prev_reclen = buf->prev_reclen; + if (prev_reclen && signal_pending(current)) return -EINTR; - - /* - * Note! This range-checks 'previous' (which may be NULL). - * The real range was checked in getdents - */ - if (!user_access_begin(dirent, sizeof(*dirent))) - goto efault; - if (dirent) - unsafe_put_user(offset, &dirent->d_off, efault_end); dirent = buf->current_dir; + prev = (void __user *) dirent - prev_reclen; + if (!user_access_begin(prev, reclen + prev_reclen)) + goto efault; + + /* This might be 'dirent->d_off', but if so it will get overwritten */ + unsafe_put_user(offset, &prev->d_off, efault_end); unsafe_put_user(d_ino, &dirent->d_ino, efault_end); unsafe_put_user(reclen, &dirent->d_reclen, efault_end); unsafe_put_user(d_type, (char __user *) dirent + reclen - 1, efault_end); unsafe_copy_dirent_name(dirent->d_name, name, namlen, efault_end); user_access_end();
- buf->previous = dirent; - dirent = (void __user *)dirent + reclen; - buf->current_dir = dirent; + buf->current_dir = (void __user *)dirent + reclen; + buf->prev_reclen = reclen; buf->count -= reclen; return 0; efault_end: @@ -267,7 +264,6 @@ SYSCALL_DEFINE3(getdents, unsigned int, struct linux_dirent __user *, dirent, unsigned int, count) { struct fd f; - struct linux_dirent __user * lastdirent; struct getdents_callback buf = { .ctx.actor = filldir, .count = count, @@ -285,8 +281,10 @@ SYSCALL_DEFINE3(getdents, unsigned int, error = iterate_dir(f.file, &buf.ctx); if (error >= 0) error = buf.error; - lastdirent = buf.previous; - if (lastdirent) { + if (buf.prev_reclen) { + struct linux_dirent __user * lastdirent; + lastdirent = (void __user *)buf.current_dir - buf.prev_reclen; + if (put_user(buf.ctx.pos, &lastdirent->d_off)) error = -EFAULT; else @@ -299,7 +297,7 @@ SYSCALL_DEFINE3(getdents, unsigned int, struct getdents_callback64 { struct dir_context ctx; struct linux_dirent64 __user * current_dir; - struct linux_dirent64 __user * previous; + int prev_reclen; int count; int error; }; @@ -307,11 +305,12 @@ struct getdents_callback64 { static int filldir64(struct dir_context *ctx, const char *name, int namlen, loff_t offset, u64 ino, unsigned int d_type) { - struct linux_dirent64 __user *dirent; + struct linux_dirent64 __user *dirent, *prev; struct getdents_callback64 *buf = container_of(ctx, struct getdents_callback64, ctx); int reclen = ALIGN(offsetof(struct linux_dirent64, d_name) + namlen + 1, sizeof(u64)); + int prev_reclen;
buf->error = verify_dirent_name(name, namlen); if (unlikely(buf->error)) @@ -319,30 +318,27 @@ static int filldir64(struct dir_context buf->error = -EINVAL; /* only used if we fail.. */ if (reclen > buf->count) return -EINVAL; - dirent = buf->previous; - if (dirent && signal_pending(current)) + prev_reclen = buf->prev_reclen; + if (prev_reclen && signal_pending(current)) return -EINTR; - - /* - * Note! This range-checks 'previous' (which may be NULL). - * The real range was checked in getdents - */ - if (!user_access_begin(dirent, sizeof(*dirent))) - goto efault; - if (dirent) - unsafe_put_user(offset, &dirent->d_off, efault_end); dirent = buf->current_dir; + prev = (void __user *)dirent - prev_reclen; + if (!user_access_begin(prev, reclen + prev_reclen)) + goto efault; + + /* This might be 'dirent->d_off', but if so it will get overwritten */ + unsafe_put_user(offset, &prev->d_off, efault_end); unsafe_put_user(ino, &dirent->d_ino, efault_end); unsafe_put_user(reclen, &dirent->d_reclen, efault_end); unsafe_put_user(d_type, &dirent->d_type, efault_end); unsafe_copy_dirent_name(dirent->d_name, name, namlen, efault_end); user_access_end();
- buf->previous = dirent; - dirent = (void __user *)dirent + reclen; - buf->current_dir = dirent; + buf->prev_reclen = reclen; + buf->current_dir = (void __user *)dirent + reclen; buf->count -= reclen; return 0; + efault_end: user_access_end(); efault: @@ -354,7 +350,6 @@ int ksys_getdents64(unsigned int fd, str unsigned int count) { struct fd f; - struct linux_dirent64 __user * lastdirent; struct getdents_callback64 buf = { .ctx.actor = filldir64, .count = count, @@ -372,9 +367,11 @@ int ksys_getdents64(unsigned int fd, str error = iterate_dir(f.file, &buf.ctx); if (error >= 0) error = buf.error; - lastdirent = buf.previous; - if (lastdirent) { + if (buf.prev_reclen) { + struct linux_dirent64 __user * lastdirent; typeof(lastdirent->d_off) d_off = buf.ctx.pos; + + lastdirent = (void __user *) buf.current_dir - buf.prev_reclen; if (__put_user(d_off, &lastdirent->d_off)) error = -EFAULT; else
From: Jacek Anaszewski jacek.anaszewski@gmail.com
commit 90a8e82d3ca8c1f85ac63f4a94c9b034f05af4ee upstream.
When switching to using generic LED name composition mechanism via devm_led_classdev_register_ext() API the part of code initializing struct gpio_led's template name property was removed alongside. It was however overlooked that the property was also passed to devm_fwnode_get_gpiod_from_child() in place of "label" parameter, which when set to NULL, results in gpio label being initialized to '?'.
It could be observed in debugfs and failed to properly identify gpio association with LED consumer.
Fix this shortcoming by updating the GPIO label after the LED is registered and its final name is known.
Fixes: d7235f5feaa0 ("leds: gpio: Use generic support for composing LED names") Cc: Russell King linux@armlinux.org.uk Reviewed-by: Linus Walleij linus.walleij@linaro.org Signed-off-by: Jacek Anaszewski jacek.anaszewski@gmail.com [fixed comment] Signed-off-by: Pavel Machek pavel@ucw.cz Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/leds/leds-gpio.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-)
--- a/drivers/leds/leds-gpio.c +++ b/drivers/leds/leds-gpio.c @@ -151,9 +151,14 @@ static struct gpio_leds_priv *gpio_leds_ struct gpio_led led = {}; const char *state = NULL;
+ /* + * Acquire gpiod from DT with uninitialized label, which + * will be updated after LED class device is registered, + * Only then the final LED name is known. + */ led.gpiod = devm_fwnode_get_gpiod_from_child(dev, NULL, child, GPIOD_ASIS, - led.name); + NULL); if (IS_ERR(led.gpiod)) { fwnode_handle_put(child); return ERR_CAST(led.gpiod); @@ -186,6 +191,9 @@ static struct gpio_leds_priv *gpio_leds_ fwnode_handle_put(child); return ERR_PTR(ret); } + /* Set gpiod label to match the corresponding LED name. */ + gpiod_set_consumer_name(led_dat->gpiod, + led_dat->cdev.dev->kobj.name); priv->num_leds++; }
From: xiaofeng.yan yanxiaofeng7@jd.com
commit 80892772c4edac88c538165d26a0105f19b61c1c upstream.
A compliation error happen when building branch 5.5-rc7
In file included from net/hsr/hsr_main.c:12:0: net/hsr/hsr_main.h:194:20: error: two or more data types in declaration specifiers static inline void void hsr_debugfs_rename(struct net_device *dev)
So Removed one void.
Fixes: 4c2d5e33dcd3 ("hsr: rename debugfs file when interface name is changed") Signed-off-by: xiaofeng.yan yanxiaofeng7@jd.com Acked-by: Taehee Yoo ap420073@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/hsr/hsr_main.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/net/hsr/hsr_main.h +++ b/net/hsr/hsr_main.h @@ -191,7 +191,7 @@ void hsr_debugfs_term(struct hsr_priv *p void hsr_debugfs_create_root(void); void hsr_debugfs_remove_root(void); #else -static inline void void hsr_debugfs_rename(struct net_device *dev) +static inline void hsr_debugfs_rename(struct net_device *dev) { } static inline void hsr_debugfs_init(struct hsr_priv *priv,
From: Gilles Buloz gilles.buloz@kontron.com
commit 7713e62c8623c54dac88d1fa724aa487a38c3efb upstream.
in0 thresholds are written to the in2 thresholds registers in2 thresholds to in3 thresholds in3 thresholds to in4 thresholds in4 thresholds to in0 thresholds
Signed-off-by: Gilles Buloz gilles.buloz@kontron.com Link: https://lore.kernel.org/r/5de0f509.rc0oEvPOMjbfPW1w%gilles.buloz@kontron.com Fixes: 3434f3783580 ("hwmon: Driver for Nuvoton NCT7802Y") Signed-off-by: Guenter Roeck linux@roeck-us.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/hwmon/nct7802.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/hwmon/nct7802.c +++ b/drivers/hwmon/nct7802.c @@ -23,8 +23,8 @@ static const u8 REG_VOLTAGE[5] = { 0x09, 0x0a, 0x0c, 0x0d, 0x0e };
static const u8 REG_VOLTAGE_LIMIT_LSB[2][5] = { - { 0x40, 0x00, 0x42, 0x44, 0x46 }, - { 0x3f, 0x00, 0x41, 0x43, 0x45 }, + { 0x46, 0x00, 0x40, 0x42, 0x44 }, + { 0x45, 0x00, 0x3f, 0x41, 0x43 }, };
static const u8 REG_VOLTAGE_LIMIT_MSB[5] = { 0x48, 0x00, 0x47, 0x47, 0x48 };
From: Gilles Buloz gilles.buloz@kontron.com
commit e51a7dda299815e92f43960d620cdfc8dfc144f2 upstream.
No alarm is reported by /sys/.../inX_alarm
In detail:
The SMI Voltage status register is the only register giving a status for voltages, but it does not work like the non-SMI status registers used for temperatures and fans. A bit is set for each input crossing a threshold, in both direction, but the "inside" or "outside" limits info is not available. Also this register is cleared on read. Note : this is not explicitly spelled out in the datasheet, but from experiment. As a result if an input is crossing a threshold (min or max in any direction), the alarm is reported only once even if the input is still outside limits. Also if the alarm for another input is read before the one of this input, no alarm is reported at all.
Signed-off-by: Gilles Buloz gilles.buloz@kontron.com Link: https://lore.kernel.org/r/5de0f566.tBga5POKAgHlmd0p%gilles.buloz@kontron.com Fixes: 3434f3783580 ("hwmon: Driver for Nuvoton NCT7802Y") Signed-off-by: Guenter Roeck linux@roeck-us.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/hwmon/nct7802.c | 71 +++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 67 insertions(+), 4 deletions(-)
--- a/drivers/hwmon/nct7802.c +++ b/drivers/hwmon/nct7802.c @@ -58,6 +58,8 @@ static const u8 REG_VOLTAGE_LIMIT_MSB_SH struct nct7802_data { struct regmap *regmap; struct mutex access_lock; /* for multi-byte read and write operations */ + u8 in_status; + struct mutex in_alarm_lock; };
static ssize_t temp_type_show(struct device *dev, @@ -368,6 +370,66 @@ static ssize_t in_store(struct device *d return err ? : count; }
+static ssize_t in_alarm_show(struct device *dev, struct device_attribute *attr, + char *buf) +{ + struct sensor_device_attribute_2 *sattr = to_sensor_dev_attr_2(attr); + struct nct7802_data *data = dev_get_drvdata(dev); + int volt, min, max, ret; + unsigned int val; + + mutex_lock(&data->in_alarm_lock); + + /* + * The SMI Voltage status register is the only register giving a status + * for voltages. A bit is set for each input crossing a threshold, in + * both direction, but the "inside" or "outside" limits info is not + * available. Also this register is cleared on read. + * Note: this is not explicitly spelled out in the datasheet, but + * from experiment. + * To deal with this we use a status cache with one validity bit and + * one status bit for each input. Validity is cleared at startup and + * each time the register reports a change, and the status is processed + * by software based on current input value and limits. + */ + ret = regmap_read(data->regmap, 0x1e, &val); /* SMI Voltage status */ + if (ret < 0) + goto abort; + + /* invalidate cached status for all inputs crossing a threshold */ + data->in_status &= ~((val & 0x0f) << 4); + + /* if cached status for requested input is invalid, update it */ + if (!(data->in_status & (0x10 << sattr->index))) { + ret = nct7802_read_voltage(data, sattr->nr, 0); + if (ret < 0) + goto abort; + volt = ret; + + ret = nct7802_read_voltage(data, sattr->nr, 1); + if (ret < 0) + goto abort; + min = ret; + + ret = nct7802_read_voltage(data, sattr->nr, 2); + if (ret < 0) + goto abort; + max = ret; + + if (volt < min || volt > max) + data->in_status |= (1 << sattr->index); + else + data->in_status &= ~(1 << sattr->index); + + data->in_status |= 0x10 << sattr->index; + } + + ret = sprintf(buf, "%u\n", !!(data->in_status & (1 << sattr->index))); +abort: + mutex_unlock(&data->in_alarm_lock); + return ret; +} + static ssize_t temp_show(struct device *dev, struct device_attribute *attr, char *buf) { @@ -660,7 +722,7 @@ static const struct attribute_group nct7 static SENSOR_DEVICE_ATTR_2_RO(in0_input, in, 0, 0); static SENSOR_DEVICE_ATTR_2_RW(in0_min, in, 0, 1); static SENSOR_DEVICE_ATTR_2_RW(in0_max, in, 0, 2); -static SENSOR_DEVICE_ATTR_2_RO(in0_alarm, alarm, 0x1e, 3); +static SENSOR_DEVICE_ATTR_2_RO(in0_alarm, in_alarm, 0, 3); static SENSOR_DEVICE_ATTR_2_RW(in0_beep, beep, 0x5a, 3);
static SENSOR_DEVICE_ATTR_2_RO(in1_input, in, 1, 0); @@ -668,19 +730,19 @@ static SENSOR_DEVICE_ATTR_2_RO(in1_input static SENSOR_DEVICE_ATTR_2_RO(in2_input, in, 2, 0); static SENSOR_DEVICE_ATTR_2_RW(in2_min, in, 2, 1); static SENSOR_DEVICE_ATTR_2_RW(in2_max, in, 2, 2); -static SENSOR_DEVICE_ATTR_2_RO(in2_alarm, alarm, 0x1e, 0); +static SENSOR_DEVICE_ATTR_2_RO(in2_alarm, in_alarm, 2, 0); static SENSOR_DEVICE_ATTR_2_RW(in2_beep, beep, 0x5a, 0);
static SENSOR_DEVICE_ATTR_2_RO(in3_input, in, 3, 0); static SENSOR_DEVICE_ATTR_2_RW(in3_min, in, 3, 1); static SENSOR_DEVICE_ATTR_2_RW(in3_max, in, 3, 2); -static SENSOR_DEVICE_ATTR_2_RO(in3_alarm, alarm, 0x1e, 1); +static SENSOR_DEVICE_ATTR_2_RO(in3_alarm, in_alarm, 3, 1); static SENSOR_DEVICE_ATTR_2_RW(in3_beep, beep, 0x5a, 1);
static SENSOR_DEVICE_ATTR_2_RO(in4_input, in, 4, 0); static SENSOR_DEVICE_ATTR_2_RW(in4_min, in, 4, 1); static SENSOR_DEVICE_ATTR_2_RW(in4_max, in, 4, 2); -static SENSOR_DEVICE_ATTR_2_RO(in4_alarm, alarm, 0x1e, 2); +static SENSOR_DEVICE_ATTR_2_RO(in4_alarm, in_alarm, 4, 2); static SENSOR_DEVICE_ATTR_2_RW(in4_beep, beep, 0x5a, 2);
static struct attribute *nct7802_in_attrs[] = { @@ -1011,6 +1073,7 @@ static int nct7802_probe(struct i2c_clie return PTR_ERR(data->regmap);
mutex_init(&data->access_lock); + mutex_init(&data->in_alarm_lock);
ret = nct7802_init_chip(data); if (ret < 0)
From: Bart Van Assche bvanassche@acm.org
commit 04060db41178c7c244f2c7dcd913e7fd331de915 upstream.
iscsit_close_connection() calls isert_wait_conn(). Due to commit e9d3009cb936 both functions call target_wait_for_sess_cmds() although that last function should be called only once. Fix this by removing the target_wait_for_sess_cmds() call from isert_wait_conn() and by only calling isert_wait_conn() after target_wait_for_sess_cmds().
Fixes: e9d3009cb936 ("scsi: target: iscsi: Wait for all commands to finish before freeing a session"). Link: https://lore.kernel.org/r/20200116044737.19507-1-bvanassche@acm.org Reported-by: Rahul Kundu rahul.kundu@chelsio.com Signed-off-by: Bart Van Assche bvanassche@acm.org Tested-by: Mike Marciniszyn mike.marciniszyn@intel.com Acked-by: Sagi Grimberg sagi@grimberg.me Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/infiniband/ulp/isert/ib_isert.c | 12 ------------ drivers/target/iscsi/iscsi_target.c | 6 +++--- 2 files changed, 3 insertions(+), 15 deletions(-)
--- a/drivers/infiniband/ulp/isert/ib_isert.c +++ b/drivers/infiniband/ulp/isert/ib_isert.c @@ -2575,17 +2575,6 @@ isert_wait4logout(struct isert_conn *ise } }
-static void -isert_wait4cmds(struct iscsi_conn *conn) -{ - isert_info("iscsi_conn %p\n", conn); - - if (conn->sess) { - target_sess_cmd_list_set_waiting(conn->sess->se_sess); - target_wait_for_sess_cmds(conn->sess->se_sess); - } -} - /** * isert_put_unsol_pending_cmds() - Drop commands waiting for * unsolicitate dataout @@ -2633,7 +2622,6 @@ static void isert_wait_conn(struct iscsi
ib_drain_qp(isert_conn->qp); isert_put_unsol_pending_cmds(conn); - isert_wait4cmds(conn); isert_wait4logout(isert_conn);
queue_work(isert_release_wq, &isert_conn->release_work); --- a/drivers/target/iscsi/iscsi_target.c +++ b/drivers/target/iscsi/iscsi_target.c @@ -4151,9 +4151,6 @@ int iscsit_close_connection( iscsit_stop_nopin_response_timer(conn); iscsit_stop_nopin_timer(conn);
- if (conn->conn_transport->iscsit_wait_conn) - conn->conn_transport->iscsit_wait_conn(conn); - /* * During Connection recovery drop unacknowledged out of order * commands for this connection, and prepare the other commands @@ -4239,6 +4236,9 @@ int iscsit_close_connection( target_sess_cmd_list_set_waiting(sess->se_sess); target_wait_for_sess_cmds(sess->se_sess);
+ if (conn->conn_transport->iscsit_wait_conn) + conn->conn_transport->iscsit_wait_conn(conn); + ahash_request_free(conn->conn_tx_hash); if (conn->conn_rx_hash) { struct crypto_ahash *tfm;
From: Changbin Du changbin.du@gmail.com
commit d0695e2351102affd8efae83989056bc4b275917 upstream.
Just as commit 0566e40ce7 ("tracing: initcall: Ordered comparison of function pointers"), this patch fixes another remaining one in xen.h found by clang-9.
In file included from arch/x86/xen/trace.c:21: In file included from ./include/trace/events/xen.h:475: In file included from ./include/trace/define_trace.h:102: In file included from ./include/trace/trace_events.h:473: ./include/trace/events/xen.h:69:7: warning: ordered comparison of function \ pointers ('xen_mc_callback_fn_t' (aka 'void (*)(void *)') and 'xen_mc_callback_fn_t') [-Wordered-compare-function-pointers] __field(xen_mc_callback_fn_t, fn) ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ./include/trace/trace_events.h:421:29: note: expanded from macro '__field' ^ ./include/trace/trace_events.h:407:6: note: expanded from macro '__field_ext' is_signed_type(type), filter_type); \ ^ ./include/linux/trace_events.h:554:44: note: expanded from macro 'is_signed_type' ^
Fixes: c796f213a6934 ("xen/trace: add multicall tracing") Signed-off-by: Changbin Du changbin.du@gmail.com Signed-off-by: Steven Rostedt (VMware) rostedt@goodmis.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- include/trace/events/xen.h | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
--- a/include/trace/events/xen.h +++ b/include/trace/events/xen.h @@ -66,7 +66,11 @@ TRACE_EVENT(xen_mc_callback, TP_PROTO(xen_mc_callback_fn_t fn, void *data), TP_ARGS(fn, data), TP_STRUCT__entry( - __field(xen_mc_callback_fn_t, fn) + /* + * Use field_struct to avoid is_signed_type() + * comparison of a function pointer. + */ + __field_struct(xen_mc_callback_fn_t, fn) __field(void *, data) ), TP_fast_assign(
From: Johannes Berg johannes.berg@intel.com
commit b9f726c94224e863d4d3458dfec2e7e1284a39ce upstream.
It used to be the case that if we got here, we wouldn't warn but instead allocate the queue (DQA). With using the mac80211 TXQs model this changed, and we really have nothing to do with the frame here anymore, hence the warning now.
However, clearly we missed in coding & review that this is now a pure error path and leaks the SKB if we return 0 instead of an indication that the SKB needs to be freed. Fix this.
Signed-off-by: Johannes Berg johannes.berg@intel.com Fixes: cfbc6c4c5b91 ("iwlwifi: mvm: support mac80211 TXQs model") Signed-off-by: Luca Coelho luciano.coelho@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/net/wireless/intel/iwlwifi/mvm/tx.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/net/wireless/intel/iwlwifi/mvm/tx.c +++ b/drivers/net/wireless/intel/iwlwifi/mvm/tx.c @@ -1151,7 +1151,7 @@ static int iwl_mvm_tx_mpdu(struct iwl_mv if (WARN_ONCE(txq_id == IWL_MVM_INVALID_QUEUE, "Invalid TXQ id")) { iwl_trans_free_tx_cmd(mvm->trans, dev_cmd); spin_unlock(&mvmsta->lock); - return 0; + return -1; }
if (!iwl_mvm_has_new_tx_api(mvm)) {
From: Johannes Berg johannes.berg@intel.com
commit df2378ab0f2a9dd4cf4501268af1902cc4ebacd8 upstream.
When we transmit after TXQ dequeue, we aren't paying attention to the return value of the transmit functions, leading to a potential SKB leak.
Refactor the code a bit (and rename ..._tx to ..._tx_sta) to check for this happening.
Signed-off-by: Johannes Berg johannes.berg@intel.com Fixes: cfbc6c4c5b91 ("iwlwifi: mvm: support mac80211 TXQs model") Signed-off-by: Luca Coelho luciano.coelho@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/net/wireless/intel/iwlwifi/mvm/mac80211.c | 28 ++++++++++++---------- drivers/net/wireless/intel/iwlwifi/mvm/mvm.h | 4 +-- drivers/net/wireless/intel/iwlwifi/mvm/tx.c | 4 +-- 3 files changed, 20 insertions(+), 16 deletions(-)
--- a/drivers/net/wireless/intel/iwlwifi/mvm/mac80211.c +++ b/drivers/net/wireless/intel/iwlwifi/mvm/mac80211.c @@ -742,6 +742,20 @@ int iwl_mvm_mac_setup_register(struct iw return ret; }
+static void iwl_mvm_tx_skb(struct iwl_mvm *mvm, struct sk_buff *skb, + struct ieee80211_sta *sta) +{ + if (likely(sta)) { + if (likely(iwl_mvm_tx_skb_sta(mvm, skb, sta) == 0)) + return; + } else { + if (likely(iwl_mvm_tx_skb_non_sta(mvm, skb) == 0)) + return; + } + + ieee80211_free_txskb(mvm->hw, skb); +} + static void iwl_mvm_mac_tx(struct ieee80211_hw *hw, struct ieee80211_tx_control *control, struct sk_buff *skb) @@ -785,14 +799,7 @@ static void iwl_mvm_mac_tx(struct ieee80 } }
- if (sta) { - if (iwl_mvm_tx_skb(mvm, skb, sta)) - goto drop; - return; - } - - if (iwl_mvm_tx_skb_non_sta(mvm, skb)) - goto drop; + iwl_mvm_tx_skb(mvm, skb, sta); return; drop: ieee80211_free_txskb(hw, skb); @@ -842,10 +849,7 @@ void iwl_mvm_mac_itxq_xmit(struct ieee80 break; }
- if (!txq->sta) - iwl_mvm_tx_skb_non_sta(mvm, skb); - else - iwl_mvm_tx_skb(mvm, skb, txq->sta); + iwl_mvm_tx_skb(mvm, skb, txq->sta); } } while (atomic_dec_return(&mvmtxq->tx_request)); rcu_read_unlock(); --- a/drivers/net/wireless/intel/iwlwifi/mvm/mvm.h +++ b/drivers/net/wireless/intel/iwlwifi/mvm/mvm.h @@ -1508,8 +1508,8 @@ int __must_check iwl_mvm_send_cmd_status int __must_check iwl_mvm_send_cmd_pdu_status(struct iwl_mvm *mvm, u32 id, u16 len, const void *data, u32 *status); -int iwl_mvm_tx_skb(struct iwl_mvm *mvm, struct sk_buff *skb, - struct ieee80211_sta *sta); +int iwl_mvm_tx_skb_sta(struct iwl_mvm *mvm, struct sk_buff *skb, + struct ieee80211_sta *sta); int iwl_mvm_tx_skb_non_sta(struct iwl_mvm *mvm, struct sk_buff *skb); void iwl_mvm_set_tx_cmd(struct iwl_mvm *mvm, struct sk_buff *skb, struct iwl_tx_cmd *tx_cmd, --- a/drivers/net/wireless/intel/iwlwifi/mvm/tx.c +++ b/drivers/net/wireless/intel/iwlwifi/mvm/tx.c @@ -1203,8 +1203,8 @@ drop: return -1; }
-int iwl_mvm_tx_skb(struct iwl_mvm *mvm, struct sk_buff *skb, - struct ieee80211_sta *sta) +int iwl_mvm_tx_skb_sta(struct iwl_mvm *mvm, struct sk_buff *skb, + struct ieee80211_sta *sta) { struct iwl_mvm_sta *mvmsta = iwl_mvm_sta_from_mac80211(sta); struct ieee80211_tx_info info;
From: Matthew Auld matthew.auld@intel.com
commit ecc4d2a52df65479de5e333a9065ed02202a400f upstream.
If we create a rather large userptr object(e.g 1ULL << 32) we might shift past the type-width of num_pages: (int)num_pages << PAGE_SHIFT, resulting in a totally bogus sg_table, which fortunately will eventually manifest as:
gen8_ppgtt_insert_huge:463 GEM_BUG_ON(iter->sg->length < page_size) kernel BUG at drivers/gpu/drm/i915/gt/gen8_ppgtt.c:463!
v2: more unsigned long prefer I915_GTT_PAGE_SIZE
Fixes: 5cc9ed4b9a7a ("drm/i915: Introduce mapping of user pages into video memory (userptr) ioctl") Signed-off-by: Matthew Auld matthew.auld@intel.com Cc: Chris Wilson chris@chris-wilson.co.uk Reviewed-by: Chris Wilson chris@chris-wilson.co.uk Signed-off-by: Chris Wilson chris@chris-wilson.co.uk Link: https://patchwork.freedesktop.org/patch/msgid/20200117132413.1170563-2-matth... (cherry picked from commit 8e78871bc1e5efec22c950d3fd24ddb63d4ff28a) Signed-off-by: Joonas Lahtinen joonas.lahtinen@linux.intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/gpu/drm/i915/gem/i915_gem_userptr.c | 9 +++++---- drivers/gpu/drm/i915/i915_gem_gtt.c | 2 ++ 2 files changed, 7 insertions(+), 4 deletions(-)
--- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c @@ -427,7 +427,7 @@ struct get_pages_work {
static struct sg_table * __i915_gem_userptr_alloc_pages(struct drm_i915_gem_object *obj, - struct page **pvec, int num_pages) + struct page **pvec, unsigned long num_pages) { unsigned int max_segment = i915_sg_segment_size(); struct sg_table *st; @@ -473,9 +473,10 @@ __i915_gem_userptr_get_pages_worker(stru { struct get_pages_work *work = container_of(_work, typeof(*work), work); struct drm_i915_gem_object *obj = work->obj; - const int npages = obj->base.size >> PAGE_SHIFT; + const unsigned long npages = obj->base.size >> PAGE_SHIFT; + unsigned long pinned; struct page **pvec; - int pinned, ret; + int ret;
ret = -ENOMEM; pinned = 0; @@ -578,7 +579,7 @@ __i915_gem_userptr_get_pages_schedule(st
static int i915_gem_userptr_get_pages(struct drm_i915_gem_object *obj) { - const int num_pages = obj->base.size >> PAGE_SHIFT; + const unsigned long num_pages = obj->base.size >> PAGE_SHIFT; struct mm_struct *mm = obj->userptr.mm->mm; struct page **pvec; struct sg_table *pages; --- a/drivers/gpu/drm/i915/i915_gem_gtt.c +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c @@ -1178,6 +1178,7 @@ gen8_ppgtt_insert_pte(struct i915_ppgtt pd = i915_pd_entry(pdp, gen8_pd_index(idx, 2)); vaddr = kmap_atomic_px(i915_pt_entry(pd, gen8_pd_index(idx, 1))); do { + GEM_BUG_ON(iter->sg->length < I915_GTT_PAGE_SIZE); vaddr[gen8_pd_index(idx, 0)] = pte_encode | iter->dma;
iter->dma += I915_GTT_PAGE_SIZE; @@ -1657,6 +1658,7 @@ static void gen6_ppgtt_insert_entries(st
vaddr = kmap_atomic_px(i915_pt_entry(pd, act_pt)); do { + GEM_BUG_ON(iter.sg->length < I915_GTT_PAGE_SIZE); vaddr[act_pte] = pte_encode | GEN6_PTE_ADDR_ENCODE(iter.dma);
iter.dma += I915_GTT_PAGE_SIZE;
From: Ulrich Weber ulrich.weber@gmail.com
commit 4e4362d2bf2a49ff44dbbc9585207977ca3d71d0 upstream.
Commit 9b42c1f179a6 ("xfrm: Extend the output_mark") added output_mark support but missed ESP offload support.
xfrm_smark_get() is not called within xfrm_input() for packets coming from esp4_gro_receive() or esp6_gro_receive(). Therefore call xfrm_smark_get() directly within these functions.
Fixes: 9b42c1f179a6 ("xfrm: Extend the output_mark to support input direction and masking.") Signed-off-by: Ulrich Weber ulrich.weber@gmail.com Signed-off-by: Steffen Klassert steffen.klassert@secunet.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/ipv4/esp4_offload.c | 2 ++ net/ipv6/esp6_offload.c | 2 ++ 2 files changed, 4 insertions(+)
--- a/net/ipv4/esp4_offload.c +++ b/net/ipv4/esp4_offload.c @@ -57,6 +57,8 @@ static struct sk_buff *esp4_gro_receive( if (!x) goto out_reset;
+ skb->mark = xfrm_smark_get(skb->mark, x); + sp->xvec[sp->len++] = x; sp->olen++;
--- a/net/ipv6/esp6_offload.c +++ b/net/ipv6/esp6_offload.c @@ -79,6 +79,8 @@ static struct sk_buff *esp6_gro_receive( if (!x) goto out_reset;
+ skb->mark = xfrm_smark_get(skb->mark, x); + sp->xvec[sp->len++] = x; sp->olen++;
From: Jakub Sitnicki jakub@cloudflare.com
commit 58c8db929db1c1d785a6f5d8f8692e5dbcc35e84 upstream.
As John Fastabend reports [0], psock state tear-down can happen on receive path *after* unlocking the socket, if the only other psock user, that is sockmap or sockhash, releases its psock reference before tcp_bpf_recvmsg does so:
tcp_bpf_recvmsg() psock = sk_psock_get(sk) <- refcnt 2 lock_sock(sk); ... sock_map_free() <- refcnt 1 release_sock(sk) sk_psock_put() <- refcnt 0
Remove the lockdep check for socket lock in psock tear-down that got introduced in 7e81a3530206 ("bpf: Sockmap, ensure sock lock held during tear down").
[0] https://lore.kernel.org/netdev/5e25dc995d7d_74082aaee6e465b441@john-XPS-13-9...
Fixes: 7e81a3530206 ("bpf: Sockmap, ensure sock lock held during tear down") Reported-by: syzbot+d73682fcf7fee6982fe3@syzkaller.appspotmail.com Suggested-by: John Fastabend john.fastabend@gmail.com Signed-off-by: Jakub Sitnicki jakub@cloudflare.com Acked-by: John Fastabend john.fastabend@gmail.com Acked-by: Daniel Borkmann daniel@iogearbox.net Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/core/skmsg.c | 2 -- 1 file changed, 2 deletions(-)
--- a/net/core/skmsg.c +++ b/net/core/skmsg.c @@ -594,8 +594,6 @@ EXPORT_SYMBOL_GPL(sk_psock_destroy);
void sk_psock_drop(struct sock *sk, struct sk_psock *psock) { - sock_owned_by_me(sk); - sk_psock_cork_free(psock); sk_psock_zap_ingress(psock);
From: Al Viro viro@zeniv.linux.org.uk
commit d0cb50185ae942b03c4327be322055d622dc79f6 upstream.
may_create_in_sticky() call is done when we already have dropped the reference to dir.
Fixes: 30aba6656f61e (namei: allow restricted O_CREAT of FIFOs and regular files) Signed-off-by: Al Viro viro@zeniv.linux.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/namei.c | 17 ++++++++++------- 1 file changed, 10 insertions(+), 7 deletions(-)
--- a/fs/namei.c +++ b/fs/namei.c @@ -1001,7 +1001,8 @@ static int may_linkat(struct path *link) * may_create_in_sticky - Check whether an O_CREAT open in a sticky directory * should be allowed, or not, on files that already * exist. - * @dir: the sticky parent directory + * @dir_mode: mode bits of directory + * @dir_uid: owner of directory * @inode: the inode of the file to open * * Block an O_CREAT open of a FIFO (or a regular file) when: @@ -1017,18 +1018,18 @@ static int may_linkat(struct path *link) * * Returns 0 if the open is allowed, -ve on error. */ -static int may_create_in_sticky(struct dentry * const dir, +static int may_create_in_sticky(umode_t dir_mode, kuid_t dir_uid, struct inode * const inode) { if ((!sysctl_protected_fifos && S_ISFIFO(inode->i_mode)) || (!sysctl_protected_regular && S_ISREG(inode->i_mode)) || - likely(!(dir->d_inode->i_mode & S_ISVTX)) || - uid_eq(inode->i_uid, dir->d_inode->i_uid) || + likely(!(dir_mode & S_ISVTX)) || + uid_eq(inode->i_uid, dir_uid) || uid_eq(current_fsuid(), inode->i_uid)) return 0;
- if (likely(dir->d_inode->i_mode & 0002) || - (dir->d_inode->i_mode & 0020 && + if (likely(dir_mode & 0002) || + (dir_mode & 0020 && ((sysctl_protected_fifos >= 2 && S_ISFIFO(inode->i_mode)) || (sysctl_protected_regular >= 2 && S_ISREG(inode->i_mode))))) { return -EACCES; @@ -3248,6 +3249,8 @@ static int do_last(struct nameidata *nd, struct file *file, const struct open_flags *op) { struct dentry *dir = nd->path.dentry; + kuid_t dir_uid = dir->d_inode->i_uid; + umode_t dir_mode = dir->d_inode->i_mode; int open_flag = op->open_flag; bool will_truncate = (open_flag & O_TRUNC) != 0; bool got_write = false; @@ -3383,7 +3386,7 @@ finish_open: error = -EISDIR; if (d_is_dir(nd->path.dentry)) goto out; - error = may_create_in_sticky(dir, + error = may_create_in_sticky(dir_mode, dir_uid, d_backing_inode(nd->path.dentry)); if (unlikely(error)) goto out;
From: Linus Torvalds torvalds@linux-foundation.org
commit 2c6b7bcd747201441923a0d3062577a8d1fbd8f8 upstream.
Commit 8a23eb804ca4 ("Make filldir[64]() verify the directory entry filename is valid") added some minimal validity checks on the directory entries passed to filldir[64](). But they really were pretty minimal.
This fleshes out at least the name length check: we used to disallow zero-length names, but really, negative lengths or oevr-long names aren't ok either. Both could happen if there is some filesystem corruption going on.
Now, most filesystems tend to use just an "unsigned char" or similar for the length of a directory entry name, so even with a corrupt filesystem you should never see anything odd like that. But since we then use the name length to create the directory entry record length, let's make sure it actually is half-way sensible.
Note how POSIX states that the size of a path component is limited by NAME_MAX, but we actually use PATH_MAX for the check here. That's because while NAME_MAX is generally the correct maximum name length (it's 255, for the same old "name length is usually just a byte on disk"), there's nothing in the VFS layer that really cares.
So the real limitation at a VFS layer is the total pathname length you can pass as a filename: PATH_MAX.
Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/readdir.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
--- a/fs/readdir.c +++ b/fs/readdir.c @@ -102,10 +102,14 @@ EXPORT_SYMBOL(iterate_dir); * filename length, and the above "soft error" worry means * that it's probably better left alone until we have that * issue clarified. + * + * Note the PATH_MAX check - it's arbitrary but the real + * kernel limit on a possible path component, not NAME_MAX, + * which is the technical standard limit. */ static int verify_dirent_name(const char *name, int len) { - if (!len) + if (len <= 0 || len >= PATH_MAX) return -EIO; if (memchr(name, '/', len)) return -EIO;
From: Finn Thain fthain@telegraphics.com.au
commit 865ad2f2201dc18685ba2686f13217f8b3a9c52c upstream.
The netif_stop_queue() call in sonic_send_packet() races with the netif_wake_queue() call in sonic_interrupt(). This causes issues like "NETDEV WATCHDOG: eth0 (macsonic): transmit queue 0 timed out". Fix this by disabling interrupts when accessing tx_skb[] and next_tx. Update a comment to clarify the synchronization properties.
Fixes: efcce839360f ("[PATCH] macsonic/jazzsonic network drivers update") Tested-by: Stan Johnson userm57@yahoo.com Signed-off-by: Finn Thain fthain@telegraphics.com.au Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/net/ethernet/natsemi/sonic.c | 49 +++++++++++++++++++++++++---------- drivers/net/ethernet/natsemi/sonic.h | 1 2 files changed, 36 insertions(+), 14 deletions(-)
--- a/drivers/net/ethernet/natsemi/sonic.c +++ b/drivers/net/ethernet/natsemi/sonic.c @@ -64,6 +64,8 @@ static int sonic_open(struct net_device
netif_dbg(lp, ifup, dev, "%s: initializing sonic driver\n", __func__);
+ spin_lock_init(&lp->lock); + for (i = 0; i < SONIC_NUM_RRS; i++) { struct sk_buff *skb = netdev_alloc_skb(dev, SONIC_RBSIZE + 2); if (skb == NULL) { @@ -206,8 +208,6 @@ static void sonic_tx_timeout(struct net_ * wake the tx queue * Concurrently with all of this, the SONIC is potentially writing to * the status flags of the TDs. - * Until some mutual exclusion is added, this code will not work with SMP. However, - * MIPS Jazz machines and m68k Macs were all uni-processor machines. */
static int sonic_send_packet(struct sk_buff *skb, struct net_device *dev) @@ -215,7 +215,8 @@ static int sonic_send_packet(struct sk_b struct sonic_local *lp = netdev_priv(dev); dma_addr_t laddr; int length; - int entry = lp->next_tx; + int entry; + unsigned long flags;
netif_dbg(lp, tx_queued, dev, "%s: skb=%p\n", __func__, skb);
@@ -237,6 +238,10 @@ static int sonic_send_packet(struct sk_b return NETDEV_TX_OK; }
+ spin_lock_irqsave(&lp->lock, flags); + + entry = lp->next_tx; + sonic_tda_put(dev, entry, SONIC_TD_STATUS, 0); /* clear status */ sonic_tda_put(dev, entry, SONIC_TD_FRAG_COUNT, 1); /* single fragment */ sonic_tda_put(dev, entry, SONIC_TD_PKTSIZE, length); /* length of packet */ @@ -246,10 +251,6 @@ static int sonic_send_packet(struct sk_b sonic_tda_put(dev, entry, SONIC_TD_LINK, sonic_tda_get(dev, entry, SONIC_TD_LINK) | SONIC_EOL);
- /* - * Must set tx_skb[entry] only after clearing status, and - * before clearing EOL and before stopping queue - */ wmb(); lp->tx_len[entry] = length; lp->tx_laddr[entry] = laddr; @@ -272,6 +273,8 @@ static int sonic_send_packet(struct sk_b
SONIC_WRITE(SONIC_CMD, SONIC_CR_TXP);
+ spin_unlock_irqrestore(&lp->lock, flags); + return NETDEV_TX_OK; }
@@ -284,9 +287,21 @@ static irqreturn_t sonic_interrupt(int i struct net_device *dev = dev_id; struct sonic_local *lp = netdev_priv(dev); int status; + unsigned long flags; + + /* The lock has two purposes. Firstly, it synchronizes sonic_interrupt() + * with sonic_send_packet() so that the two functions can share state. + * Secondly, it makes sonic_interrupt() re-entrant, as that is required + * by macsonic which must use two IRQs with different priority levels. + */ + spin_lock_irqsave(&lp->lock, flags); + + status = SONIC_READ(SONIC_ISR) & SONIC_IMR_DEFAULT; + if (!status) { + spin_unlock_irqrestore(&lp->lock, flags);
- if (!(status = SONIC_READ(SONIC_ISR) & SONIC_IMR_DEFAULT)) return IRQ_NONE; + }
do { if (status & SONIC_INT_PKTRX) { @@ -300,11 +315,12 @@ static irqreturn_t sonic_interrupt(int i int td_status; int freed_some = 0;
- /* At this point, cur_tx is the index of a TD that is one of: - * unallocated/freed (status set & tx_skb[entry] clear) - * allocated and sent (status set & tx_skb[entry] set ) - * allocated and not yet sent (status clear & tx_skb[entry] set ) - * still being allocated by sonic_send_packet (status clear & tx_skb[entry] clear) + /* The state of a Transmit Descriptor may be inferred + * from { tx_skb[entry], td_status } as follows. + * { clear, clear } => the TD has never been used + * { set, clear } => the TD was handed to SONIC + * { set, set } => the TD was handed back + * { clear, set } => the TD is available for re-use */
netif_dbg(lp, intr, dev, "%s: tx done\n", __func__); @@ -406,7 +422,12 @@ static irqreturn_t sonic_interrupt(int i /* load CAM done */ if (status & SONIC_INT_LCD) SONIC_WRITE(SONIC_ISR, SONIC_INT_LCD); /* clear the interrupt */ - } while((status = SONIC_READ(SONIC_ISR) & SONIC_IMR_DEFAULT)); + + status = SONIC_READ(SONIC_ISR) & SONIC_IMR_DEFAULT; + } while (status); + + spin_unlock_irqrestore(&lp->lock, flags); + return IRQ_HANDLED; }
--- a/drivers/net/ethernet/natsemi/sonic.h +++ b/drivers/net/ethernet/natsemi/sonic.h @@ -322,6 +322,7 @@ struct sonic_local { int msg_enable; struct device *device; /* generic device */ struct net_device_stats stats; + spinlock_t lock; };
#define TX_TIMEOUT (3 * HZ)
From: Finn Thain fthain@telegraphics.com.au
commit 5fedabf5a70be26b19d7520f09f12a62274317c6 upstream.
The chip can change a packet's descriptor status flags at any time. However, an active interrupt flag gets cleared rather late. This allows a race condition that could theoretically lose an interrupt. Fix this by clearing asserted interrupt flags immediately.
Fixes: efcce839360f ("[PATCH] macsonic/jazzsonic network drivers update") Tested-by: Stan Johnson userm57@yahoo.com Signed-off-by: Finn Thain fthain@telegraphics.com.au Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/net/ethernet/natsemi/sonic.c | 28 ++++++---------------------- 1 file changed, 6 insertions(+), 22 deletions(-)
--- a/drivers/net/ethernet/natsemi/sonic.c +++ b/drivers/net/ethernet/natsemi/sonic.c @@ -304,10 +304,11 @@ static irqreturn_t sonic_interrupt(int i }
do { + SONIC_WRITE(SONIC_ISR, status); /* clear the interrupt(s) */ + if (status & SONIC_INT_PKTRX) { netif_dbg(lp, intr, dev, "%s: packet rx\n", __func__); sonic_rx(dev); /* got packet(s) */ - SONIC_WRITE(SONIC_ISR, SONIC_INT_PKTRX); /* clear the interrupt */ }
if (status & SONIC_INT_TXDN) { @@ -362,7 +363,6 @@ static irqreturn_t sonic_interrupt(int i if (freed_some || lp->tx_skb[entry] == NULL) netif_wake_queue(dev); /* The ring is no longer full */ lp->cur_tx = entry; - SONIC_WRITE(SONIC_ISR, SONIC_INT_TXDN); /* clear the interrupt */ }
/* @@ -372,42 +372,31 @@ static irqreturn_t sonic_interrupt(int i netif_dbg(lp, rx_err, dev, "%s: rx fifo overrun\n", __func__); lp->stats.rx_fifo_errors++; - SONIC_WRITE(SONIC_ISR, SONIC_INT_RFO); /* clear the interrupt */ } if (status & SONIC_INT_RDE) { netif_dbg(lp, rx_err, dev, "%s: rx descriptors exhausted\n", __func__); lp->stats.rx_dropped++; - SONIC_WRITE(SONIC_ISR, SONIC_INT_RDE); /* clear the interrupt */ } if (status & SONIC_INT_RBAE) { netif_dbg(lp, rx_err, dev, "%s: rx buffer area exceeded\n", __func__); lp->stats.rx_dropped++; - SONIC_WRITE(SONIC_ISR, SONIC_INT_RBAE); /* clear the interrupt */ }
/* counter overruns; all counters are 16bit wide */ - if (status & SONIC_INT_FAE) { + if (status & SONIC_INT_FAE) lp->stats.rx_frame_errors += 65536; - SONIC_WRITE(SONIC_ISR, SONIC_INT_FAE); /* clear the interrupt */ - } - if (status & SONIC_INT_CRC) { + if (status & SONIC_INT_CRC) lp->stats.rx_crc_errors += 65536; - SONIC_WRITE(SONIC_ISR, SONIC_INT_CRC); /* clear the interrupt */ - } - if (status & SONIC_INT_MP) { + if (status & SONIC_INT_MP) lp->stats.rx_missed_errors += 65536; - SONIC_WRITE(SONIC_ISR, SONIC_INT_MP); /* clear the interrupt */ - }
/* transmit error */ - if (status & SONIC_INT_TXER) { + if (status & SONIC_INT_TXER) if (SONIC_READ(SONIC_TCR) & SONIC_TCR_FU) netif_dbg(lp, tx_err, dev, "%s: tx fifo underrun\n", __func__); - SONIC_WRITE(SONIC_ISR, SONIC_INT_TXER); /* clear the interrupt */ - }
/* bus retry */ if (status & SONIC_INT_BR) { @@ -416,13 +405,8 @@ static irqreturn_t sonic_interrupt(int i /* ... to help debug DMA problems causing endless interrupts. */ /* Bounce the eth interface to turn on the interrupt again. */ SONIC_WRITE(SONIC_IMR, 0); - SONIC_WRITE(SONIC_ISR, SONIC_INT_BR); /* clear the interrupt */ }
- /* load CAM done */ - if (status & SONIC_INT_LCD) - SONIC_WRITE(SONIC_ISR, SONIC_INT_LCD); /* clear the interrupt */ - status = SONIC_READ(SONIC_ISR) & SONIC_IMR_DEFAULT; } while (status);
From: Finn Thain fthain@telegraphics.com.au
commit e3885f576196ddfc670b3d53e745de96ffcb49ab upstream.
The driver accesses descriptor memory which is simultaneously accessed by the chip, so the compiler must not be allowed to re-order CPU accesses. sonic_buf_get() used 'volatile' to prevent that. sonic_buf_put() should have done so too but was overlooked.
Fixes: efcce839360f ("[PATCH] macsonic/jazzsonic network drivers update") Tested-by: Stan Johnson userm57@yahoo.com Signed-off-by: Finn Thain fthain@telegraphics.com.au Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/net/ethernet/natsemi/sonic.h | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-)
--- a/drivers/net/ethernet/natsemi/sonic.h +++ b/drivers/net/ethernet/natsemi/sonic.h @@ -345,30 +345,30 @@ static void sonic_msg_init(struct net_de as far as we can tell. */ /* OpenBSD calls this "SWO". I'd like to think that sonic_buf_put() is a much better name. */ -static inline void sonic_buf_put(void* base, int bitmode, +static inline void sonic_buf_put(u16 *base, int bitmode, int offset, __u16 val) { if (bitmode) #ifdef __BIG_ENDIAN - ((__u16 *) base + (offset*2))[1] = val; + __raw_writew(val, base + (offset * 2) + 1); #else - ((__u16 *) base + (offset*2))[0] = val; + __raw_writew(val, base + (offset * 2) + 0); #endif else - ((__u16 *) base)[offset] = val; + __raw_writew(val, base + (offset * 1) + 0); }
-static inline __u16 sonic_buf_get(void* base, int bitmode, +static inline __u16 sonic_buf_get(u16 *base, int bitmode, int offset) { if (bitmode) #ifdef __BIG_ENDIAN - return ((volatile __u16 *) base + (offset*2))[1]; + return __raw_readw(base + (offset * 2) + 1); #else - return ((volatile __u16 *) base + (offset*2))[0]; + return __raw_readw(base + (offset * 2) + 0); #endif else - return ((volatile __u16 *) base)[offset]; + return __raw_readw(base + (offset * 1) + 0); }
/* Inlines that you should actually use for reading/writing DMA buffers */
From: Finn Thain fthain@telegraphics.com.au
commit 427db97df1ee721c20bdc9a66db8a9e1da719855 upstream.
The tx_aborted_errors statistic should count packets flagged with EXD, EXC, FU, or BCM bits because those bits denote an aborted transmission. That corresponds to the bitmask 0x0446, not 0x0642. Use macros for these constants to avoid mistakes. Better to leave out FIFO Underruns (FU) as there's a separate counter for that purpose.
Don't lump all these errors in with the general tx_errors counter as that's used for tx timeout events.
On the rx side, don't count RDE and RBAE interrupts as dropped packets. These interrupts don't indicate a lost packet, just a lack of resources. When a lack of resources results in a lost packet, this gets reported in the rx_missed_errors counter (along with RFO events).
Don't double-count rx_frame_errors and rx_crc_errors.
Don't use the general rx_errors counter for events that already have special counters.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Tested-by: Stan Johnson userm57@yahoo.com Signed-off-by: Finn Thain fthain@telegraphics.com.au Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/net/ethernet/natsemi/sonic.c | 21 +++++++-------------- drivers/net/ethernet/natsemi/sonic.h | 1 + 2 files changed, 8 insertions(+), 14 deletions(-)
--- a/drivers/net/ethernet/natsemi/sonic.c +++ b/drivers/net/ethernet/natsemi/sonic.c @@ -330,18 +330,19 @@ static irqreturn_t sonic_interrupt(int i if ((td_status = sonic_tda_get(dev, entry, SONIC_TD_STATUS)) == 0) break;
- if (td_status & 0x0001) { + if (td_status & SONIC_TCR_PTX) { lp->stats.tx_packets++; lp->stats.tx_bytes += sonic_tda_get(dev, entry, SONIC_TD_PKTSIZE); } else { - lp->stats.tx_errors++; - if (td_status & 0x0642) + if (td_status & (SONIC_TCR_EXD | + SONIC_TCR_EXC | SONIC_TCR_BCM)) lp->stats.tx_aborted_errors++; - if (td_status & 0x0180) + if (td_status & + (SONIC_TCR_NCRS | SONIC_TCR_CRLS)) lp->stats.tx_carrier_errors++; - if (td_status & 0x0020) + if (td_status & SONIC_TCR_OWC) lp->stats.tx_window_errors++; - if (td_status & 0x0004) + if (td_status & SONIC_TCR_FU) lp->stats.tx_fifo_errors++; }
@@ -371,17 +372,14 @@ static irqreturn_t sonic_interrupt(int i if (status & SONIC_INT_RFO) { netif_dbg(lp, rx_err, dev, "%s: rx fifo overrun\n", __func__); - lp->stats.rx_fifo_errors++; } if (status & SONIC_INT_RDE) { netif_dbg(lp, rx_err, dev, "%s: rx descriptors exhausted\n", __func__); - lp->stats.rx_dropped++; } if (status & SONIC_INT_RBAE) { netif_dbg(lp, rx_err, dev, "%s: rx buffer area exceeded\n", __func__); - lp->stats.rx_dropped++; }
/* counter overruns; all counters are 16bit wide */ @@ -473,11 +471,6 @@ static void sonic_rx(struct net_device * sonic_rra_put(dev, entry, SONIC_RR_BUFADR_H, bufadr_h); } else { /* This should only happen, if we enable accepting broken packets. */ - lp->stats.rx_errors++; - if (status & SONIC_RCR_FAER) - lp->stats.rx_frame_errors++; - if (status & SONIC_RCR_CRCR) - lp->stats.rx_crc_errors++; } if (status & SONIC_RCR_LPKT) { /* --- a/drivers/net/ethernet/natsemi/sonic.h +++ b/drivers/net/ethernet/natsemi/sonic.h @@ -175,6 +175,7 @@ #define SONIC_TCR_NCRS 0x0100 #define SONIC_TCR_CRLS 0x0080 #define SONIC_TCR_EXC 0x0040 +#define SONIC_TCR_OWC 0x0020 #define SONIC_TCR_PMB 0x0008 #define SONIC_TCR_FU 0x0004 #define SONIC_TCR_BCM 0x0002
From: Finn Thain fthain@telegraphics.com.au
commit 9e311820f67e740f4fb8dcb82b4c4b5b05bdd1a5 upstream.
The SONIC can sometimes advance its rx buffer pointer (RRP register) without advancing its rx descriptor pointer (CRDA register). As a result the index of the current rx descriptor may not equal that of the current rx buffer. The driver mistakenly assumes that they are always equal. This assumption leads to incorrect packet lengths and possible packet duplication. Avoid this by calling a new function to locate the buffer corresponding to a given descriptor.
Fixes: efcce839360f ("[PATCH] macsonic/jazzsonic network drivers update") Tested-by: Stan Johnson userm57@yahoo.com Signed-off-by: Finn Thain fthain@telegraphics.com.au Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/net/ethernet/natsemi/sonic.c | 35 ++++++++++++++++++++++++++++++----- drivers/net/ethernet/natsemi/sonic.h | 5 +++-- 2 files changed, 33 insertions(+), 7 deletions(-)
--- a/drivers/net/ethernet/natsemi/sonic.c +++ b/drivers/net/ethernet/natsemi/sonic.c @@ -413,6 +413,21 @@ static irqreturn_t sonic_interrupt(int i return IRQ_HANDLED; }
+/* Return the array index corresponding to a given Receive Buffer pointer. */ +static int index_from_addr(struct sonic_local *lp, dma_addr_t addr, + unsigned int last) +{ + unsigned int i = last; + + do { + i = (i + 1) & SONIC_RRS_MASK; + if (addr == lp->rx_laddr[i]) + return i; + } while (i != last); + + return -ENOENT; +} + /* * We have a good packet(s), pass it/them up the network stack. */ @@ -432,6 +447,16 @@ static void sonic_rx(struct net_device *
status = sonic_rda_get(dev, entry, SONIC_RD_STATUS); if (status & SONIC_RCR_PRX) { + u32 addr = (sonic_rda_get(dev, entry, + SONIC_RD_PKTPTR_H) << 16) | + sonic_rda_get(dev, entry, SONIC_RD_PKTPTR_L); + int i = index_from_addr(lp, addr, entry); + + if (i < 0) { + WARN_ONCE(1, "failed to find buffer!\n"); + break; + } + /* Malloc up new buffer. */ new_skb = netdev_alloc_skb(dev, SONIC_RBSIZE + 2); if (new_skb == NULL) { @@ -453,7 +478,7 @@ static void sonic_rx(struct net_device *
/* now we have a new skb to replace it, pass the used one up the stack */ dma_unmap_single(lp->device, lp->rx_laddr[entry], SONIC_RBSIZE, DMA_FROM_DEVICE); - used_skb = lp->rx_skb[entry]; + used_skb = lp->rx_skb[i]; pkt_len = sonic_rda_get(dev, entry, SONIC_RD_PKTLEN); skb_trim(used_skb, pkt_len); used_skb->protocol = eth_type_trans(used_skb, dev); @@ -462,13 +487,13 @@ static void sonic_rx(struct net_device * lp->stats.rx_bytes += pkt_len;
/* and insert the new skb */ - lp->rx_laddr[entry] = new_laddr; - lp->rx_skb[entry] = new_skb; + lp->rx_laddr[i] = new_laddr; + lp->rx_skb[i] = new_skb;
bufadr_l = (unsigned long)new_laddr & 0xffff; bufadr_h = (unsigned long)new_laddr >> 16; - sonic_rra_put(dev, entry, SONIC_RR_BUFADR_L, bufadr_l); - sonic_rra_put(dev, entry, SONIC_RR_BUFADR_H, bufadr_h); + sonic_rra_put(dev, i, SONIC_RR_BUFADR_L, bufadr_l); + sonic_rra_put(dev, i, SONIC_RR_BUFADR_H, bufadr_h); } else { /* This should only happen, if we enable accepting broken packets. */ } --- a/drivers/net/ethernet/natsemi/sonic.h +++ b/drivers/net/ethernet/natsemi/sonic.h @@ -275,8 +275,9 @@ #define SONIC_NUM_RDS SONIC_NUM_RRS /* number of receive descriptors */ #define SONIC_NUM_TDS 16 /* number of transmit descriptors */
-#define SONIC_RDS_MASK (SONIC_NUM_RDS-1) -#define SONIC_TDS_MASK (SONIC_NUM_TDS-1) +#define SONIC_RRS_MASK (SONIC_NUM_RRS - 1) +#define SONIC_RDS_MASK (SONIC_NUM_RDS - 1) +#define SONIC_TDS_MASK (SONIC_NUM_TDS - 1)
#define SONIC_RBSIZE 1520 /* size of one resource buffer */
From: Finn Thain fthain@telegraphics.com.au
commit eaabfd19b2c787bbe88dc32424b9a43d67293422 upstream.
The while loop in sonic_rx() traverses the rx descriptor ring. It stops when it reaches a descriptor that the SONIC has not used. Each iteration advances the EOL flag so the SONIC can keep using more descriptors. Therefore, the while loop has no definite termination condition.
The algorithm described in the National Semiconductor literature is quite different. It consumes descriptors up to the one with its EOL flag set (which will also have its "in use" flag set). All freed descriptors are then returned to the ring at once, by adjusting the EOL flags (and link pointers).
Adopt the algorithm from datasheet as it's simpler, terminates quickly and avoids a lot of pointless descriptor EOL flag changes.
Fixes: efcce839360f ("[PATCH] macsonic/jazzsonic network drivers update") Tested-by: Stan Johnson userm57@yahoo.com Signed-off-by: Finn Thain fthain@telegraphics.com.au Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/net/ethernet/natsemi/sonic.c | 21 +++++++++++++++------ 1 file changed, 15 insertions(+), 6 deletions(-)
--- a/drivers/net/ethernet/natsemi/sonic.c +++ b/drivers/net/ethernet/natsemi/sonic.c @@ -436,6 +436,7 @@ static void sonic_rx(struct net_device * struct sonic_local *lp = netdev_priv(dev); int status; int entry = lp->cur_rx; + int prev_entry = lp->eol_rx;
while (sonic_rda_get(dev, entry, SONIC_RD_IN_USE) == 0) { struct sk_buff *used_skb; @@ -516,13 +517,21 @@ static void sonic_rx(struct net_device * /* * give back the descriptor */ - sonic_rda_put(dev, entry, SONIC_RD_LINK, - sonic_rda_get(dev, entry, SONIC_RD_LINK) | SONIC_EOL); sonic_rda_put(dev, entry, SONIC_RD_IN_USE, 1); - sonic_rda_put(dev, lp->eol_rx, SONIC_RD_LINK, - sonic_rda_get(dev, lp->eol_rx, SONIC_RD_LINK) & ~SONIC_EOL); - lp->eol_rx = entry; - lp->cur_rx = entry = (entry + 1) & SONIC_RDS_MASK; + + prev_entry = entry; + entry = (entry + 1) & SONIC_RDS_MASK; + } + + lp->cur_rx = entry; + + if (prev_entry != lp->eol_rx) { + /* Advance the EOL flag to put descriptors back into service */ + sonic_rda_put(dev, prev_entry, SONIC_RD_LINK, SONIC_EOL | + sonic_rda_get(dev, prev_entry, SONIC_RD_LINK)); + sonic_rda_put(dev, lp->eol_rx, SONIC_RD_LINK, ~SONIC_EOL & + sonic_rda_get(dev, lp->eol_rx, SONIC_RD_LINK)); + lp->eol_rx = prev_entry; } /* * If any worth-while packets have been received, netif_rx()
From: Finn Thain fthain@telegraphics.com.au
commit 94b166349503957079ef5e7d6f667f157aea014a upstream.
After sonic_tx_timeout() calls sonic_init(), it can happen that sonic_rx() will subsequently encounter a receive descriptor with no flags set. Remove the comment that says that this can't happen.
When giving a receive descriptor to the SONIC, clear the descriptor status field. That way, any rx descriptor with flags set can only be a newly received packet.
Don't process a descriptor without the LPKT bit set. The buffer is still in use by the SONIC.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Tested-by: Stan Johnson userm57@yahoo.com Signed-off-by: Finn Thain fthain@telegraphics.com.au Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/net/ethernet/natsemi/sonic.c | 15 +++++---------- 1 file changed, 5 insertions(+), 10 deletions(-)
--- a/drivers/net/ethernet/natsemi/sonic.c +++ b/drivers/net/ethernet/natsemi/sonic.c @@ -434,7 +434,6 @@ static int index_from_addr(struct sonic_ static void sonic_rx(struct net_device *dev) { struct sonic_local *lp = netdev_priv(dev); - int status; int entry = lp->cur_rx; int prev_entry = lp->eol_rx;
@@ -445,9 +444,10 @@ static void sonic_rx(struct net_device * u16 bufadr_l; u16 bufadr_h; int pkt_len; + u16 status = sonic_rda_get(dev, entry, SONIC_RD_STATUS);
- status = sonic_rda_get(dev, entry, SONIC_RD_STATUS); - if (status & SONIC_RCR_PRX) { + /* If the RD has LPKT set, the chip has finished with the RB */ + if ((status & SONIC_RCR_PRX) && (status & SONIC_RCR_LPKT)) { u32 addr = (sonic_rda_get(dev, entry, SONIC_RD_PKTPTR_H) << 16) | sonic_rda_get(dev, entry, SONIC_RD_PKTPTR_L); @@ -495,10 +495,6 @@ static void sonic_rx(struct net_device * bufadr_h = (unsigned long)new_laddr >> 16; sonic_rra_put(dev, i, SONIC_RR_BUFADR_L, bufadr_l); sonic_rra_put(dev, i, SONIC_RR_BUFADR_H, bufadr_h); - } else { - /* This should only happen, if we enable accepting broken packets. */ - } - if (status & SONIC_RCR_LPKT) { /* * this was the last packet out of the current receive buffer * give the buffer back to the SONIC @@ -511,12 +507,11 @@ static void sonic_rx(struct net_device * __func__); SONIC_WRITE(SONIC_ISR, SONIC_INT_RBE); /* clear the flag */ } - } else - printk(KERN_ERR "%s: rx desc without RCR_LPKT. Shouldn't happen !?\n", - dev->name); + } /* * give back the descriptor */ + sonic_rda_put(dev, entry, SONIC_RD_STATUS, 0); sonic_rda_put(dev, entry, SONIC_RD_IN_USE, 1);
prev_entry = entry;
From: Finn Thain fthain@telegraphics.com.au
commit 89ba879e95582d3bba55081e45b5409e883312ca upstream.
As soon as the driver is finished with a receive buffer it allocs a new one and overwrites the corresponding RRA entry with a new buffer pointer.
Problem is, the buffer pointer is split across two word-sized registers. It can't be updated in one atomic store. So this operation races with the chip while it stores received packets and advances its RRP register. This could result in memory corruption by a DMA write.
Avoid this problem by adding buffers only at the location given by the RWP register, in accordance with the National Semiconductor datasheet.
Re-factor this code into separate functions to calculate a RRA pointer and to update the RWP.
Fixes: efcce839360f ("[PATCH] macsonic/jazzsonic network drivers update") Tested-by: Stan Johnson userm57@yahoo.com Signed-off-by: Finn Thain fthain@telegraphics.com.au Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/net/ethernet/natsemi/sonic.c | 150 ++++++++++++++++++++--------------- drivers/net/ethernet/natsemi/sonic.h | 18 +++- 2 files changed, 105 insertions(+), 63 deletions(-)
--- a/drivers/net/ethernet/natsemi/sonic.c +++ b/drivers/net/ethernet/natsemi/sonic.c @@ -428,6 +428,59 @@ static int index_from_addr(struct sonic_ return -ENOENT; }
+/* Allocate and map a new skb to be used as a receive buffer. */ +static bool sonic_alloc_rb(struct net_device *dev, struct sonic_local *lp, + struct sk_buff **new_skb, dma_addr_t *new_addr) +{ + *new_skb = netdev_alloc_skb(dev, SONIC_RBSIZE + 2); + if (!*new_skb) + return false; + + if (SONIC_BUS_SCALE(lp->dma_bitmode) == 2) + skb_reserve(*new_skb, 2); + + *new_addr = dma_map_single(lp->device, skb_put(*new_skb, SONIC_RBSIZE), + SONIC_RBSIZE, DMA_FROM_DEVICE); + if (!*new_addr) { + dev_kfree_skb(*new_skb); + *new_skb = NULL; + return false; + } + + return true; +} + +/* Place a new receive resource in the Receive Resource Area and update RWP. */ +static void sonic_update_rra(struct net_device *dev, struct sonic_local *lp, + dma_addr_t old_addr, dma_addr_t new_addr) +{ + unsigned int entry = sonic_rr_entry(dev, SONIC_READ(SONIC_RWP)); + unsigned int end = sonic_rr_entry(dev, SONIC_READ(SONIC_RRP)); + u32 buf; + + /* The resources in the range [RRP, RWP) belong to the SONIC. This loop + * scans the other resources in the RRA, those in the range [RWP, RRP). + */ + do { + buf = (sonic_rra_get(dev, entry, SONIC_RR_BUFADR_H) << 16) | + sonic_rra_get(dev, entry, SONIC_RR_BUFADR_L); + + if (buf == old_addr) + break; + + entry = (entry + 1) & SONIC_RRS_MASK; + } while (entry != end); + + WARN_ONCE(buf != old_addr, "failed to find resource!\n"); + + sonic_rra_put(dev, entry, SONIC_RR_BUFADR_H, new_addr >> 16); + sonic_rra_put(dev, entry, SONIC_RR_BUFADR_L, new_addr & 0xffff); + + entry = (entry + 1) & SONIC_RRS_MASK; + + SONIC_WRITE(SONIC_RWP, sonic_rr_addr(dev, entry)); +} + /* * We have a good packet(s), pass it/them up the network stack. */ @@ -436,18 +489,15 @@ static void sonic_rx(struct net_device * struct sonic_local *lp = netdev_priv(dev); int entry = lp->cur_rx; int prev_entry = lp->eol_rx; + bool rbe = false;
while (sonic_rda_get(dev, entry, SONIC_RD_IN_USE) == 0) { - struct sk_buff *used_skb; - struct sk_buff *new_skb; - dma_addr_t new_laddr; - u16 bufadr_l; - u16 bufadr_h; - int pkt_len; u16 status = sonic_rda_get(dev, entry, SONIC_RD_STATUS);
/* If the RD has LPKT set, the chip has finished with the RB */ if ((status & SONIC_RCR_PRX) && (status & SONIC_RCR_LPKT)) { + struct sk_buff *new_skb; + dma_addr_t new_laddr; u32 addr = (sonic_rda_get(dev, entry, SONIC_RD_PKTPTR_H) << 16) | sonic_rda_get(dev, entry, SONIC_RD_PKTPTR_L); @@ -458,55 +508,35 @@ static void sonic_rx(struct net_device * break; }
- /* Malloc up new buffer. */ - new_skb = netdev_alloc_skb(dev, SONIC_RBSIZE + 2); - if (new_skb == NULL) { + if (sonic_alloc_rb(dev, lp, &new_skb, &new_laddr)) { + struct sk_buff *used_skb = lp->rx_skb[i]; + int pkt_len; + + /* Pass the used buffer up the stack */ + dma_unmap_single(lp->device, addr, SONIC_RBSIZE, + DMA_FROM_DEVICE); + + pkt_len = sonic_rda_get(dev, entry, + SONIC_RD_PKTLEN); + skb_trim(used_skb, pkt_len); + used_skb->protocol = eth_type_trans(used_skb, + dev); + netif_rx(used_skb); + lp->stats.rx_packets++; + lp->stats.rx_bytes += pkt_len; + + lp->rx_skb[i] = new_skb; + lp->rx_laddr[i] = new_laddr; + } else { + /* Failed to obtain a new buffer so re-use it */ + new_laddr = addr; lp->stats.rx_dropped++; - break; } - /* provide 16 byte IP header alignment unless DMA requires otherwise */ - if(SONIC_BUS_SCALE(lp->dma_bitmode) == 2) - skb_reserve(new_skb, 2); - - new_laddr = dma_map_single(lp->device, skb_put(new_skb, SONIC_RBSIZE), - SONIC_RBSIZE, DMA_FROM_DEVICE); - if (!new_laddr) { - dev_kfree_skb(new_skb); - printk(KERN_ERR "%s: Failed to map rx buffer, dropping packet.\n", dev->name); - lp->stats.rx_dropped++; - break; - } - - /* now we have a new skb to replace it, pass the used one up the stack */ - dma_unmap_single(lp->device, lp->rx_laddr[entry], SONIC_RBSIZE, DMA_FROM_DEVICE); - used_skb = lp->rx_skb[i]; - pkt_len = sonic_rda_get(dev, entry, SONIC_RD_PKTLEN); - skb_trim(used_skb, pkt_len); - used_skb->protocol = eth_type_trans(used_skb, dev); - netif_rx(used_skb); - lp->stats.rx_packets++; - lp->stats.rx_bytes += pkt_len; - - /* and insert the new skb */ - lp->rx_laddr[i] = new_laddr; - lp->rx_skb[i] = new_skb; - - bufadr_l = (unsigned long)new_laddr & 0xffff; - bufadr_h = (unsigned long)new_laddr >> 16; - sonic_rra_put(dev, i, SONIC_RR_BUFADR_L, bufadr_l); - sonic_rra_put(dev, i, SONIC_RR_BUFADR_H, bufadr_h); - /* - * this was the last packet out of the current receive buffer - * give the buffer back to the SONIC + /* If RBE is already asserted when RWP advances then + * it's safe to clear RBE after processing this packet. */ - lp->cur_rwp += SIZEOF_SONIC_RR * SONIC_BUS_SCALE(lp->dma_bitmode); - if (lp->cur_rwp >= lp->rra_end) lp->cur_rwp = lp->rra_laddr & 0xffff; - SONIC_WRITE(SONIC_RWP, lp->cur_rwp); - if (SONIC_READ(SONIC_ISR) & SONIC_INT_RBE) { - netif_dbg(lp, rx_err, dev, "%s: rx buffer exhausted\n", - __func__); - SONIC_WRITE(SONIC_ISR, SONIC_INT_RBE); /* clear the flag */ - } + rbe = rbe || SONIC_READ(SONIC_ISR) & SONIC_INT_RBE; + sonic_update_rra(dev, lp, addr, new_laddr); } /* * give back the descriptor @@ -528,6 +558,9 @@ static void sonic_rx(struct net_device * sonic_rda_get(dev, lp->eol_rx, SONIC_RD_LINK)); lp->eol_rx = prev_entry; } + + if (rbe) + SONIC_WRITE(SONIC_ISR, SONIC_INT_RBE); /* * If any worth-while packets have been received, netif_rx() * has done a mark_bh(NET_BH) for us and will work on them @@ -642,15 +675,10 @@ static int sonic_init(struct net_device }
/* initialize all RRA registers */ - lp->rra_end = (lp->rra_laddr + SONIC_NUM_RRS * SIZEOF_SONIC_RR * - SONIC_BUS_SCALE(lp->dma_bitmode)) & 0xffff; - lp->cur_rwp = (lp->rra_laddr + (SONIC_NUM_RRS - 1) * SIZEOF_SONIC_RR * - SONIC_BUS_SCALE(lp->dma_bitmode)) & 0xffff; - - SONIC_WRITE(SONIC_RSA, lp->rra_laddr & 0xffff); - SONIC_WRITE(SONIC_REA, lp->rra_end); - SONIC_WRITE(SONIC_RRP, lp->rra_laddr & 0xffff); - SONIC_WRITE(SONIC_RWP, lp->cur_rwp); + SONIC_WRITE(SONIC_RSA, sonic_rr_addr(dev, 0)); + SONIC_WRITE(SONIC_REA, sonic_rr_addr(dev, SONIC_NUM_RRS)); + SONIC_WRITE(SONIC_RRP, sonic_rr_addr(dev, 0)); + SONIC_WRITE(SONIC_RWP, sonic_rr_addr(dev, SONIC_NUM_RRS - 1)); SONIC_WRITE(SONIC_URRA, lp->rra_laddr >> 16); SONIC_WRITE(SONIC_EOBC, (SONIC_RBSIZE >> 1) - (lp->dma_bitmode ? 2 : 1));
--- a/drivers/net/ethernet/natsemi/sonic.h +++ b/drivers/net/ethernet/natsemi/sonic.h @@ -314,8 +314,6 @@ struct sonic_local { u32 rda_laddr; /* logical DMA address of RDA */ dma_addr_t rx_laddr[SONIC_NUM_RRS]; /* logical DMA addresses of rx skbuffs */ dma_addr_t tx_laddr[SONIC_NUM_TDS]; /* logical DMA addresses of tx skbuffs */ - unsigned int rra_end; - unsigned int cur_rwp; unsigned int cur_rx; unsigned int cur_tx; /* first unacked transmit packet */ unsigned int eol_rx; @@ -450,6 +448,22 @@ static inline __u16 sonic_rra_get(struct (entry * SIZEOF_SONIC_RR) + offset); }
+static inline u16 sonic_rr_addr(struct net_device *dev, int entry) +{ + struct sonic_local *lp = netdev_priv(dev); + + return lp->rra_laddr + + entry * SIZEOF_SONIC_RR * SONIC_BUS_SCALE(lp->dma_bitmode); +} + +static inline u16 sonic_rr_entry(struct net_device *dev, u16 addr) +{ + struct sonic_local *lp = netdev_priv(dev); + + return (addr - (u16)lp->rra_laddr) / (SIZEOF_SONIC_RR * + SONIC_BUS_SCALE(lp->dma_bitmode)); +} + static const char version[] = "sonic.c:v0.92 20.9.98 tsbogend@alpha.franken.de\n";
From: Finn Thain fthain@telegraphics.com.au
commit 3f4b7e6a2be982fd8820a2b54d46dd9c351db899 upstream.
Make sure the SONIC's DMA engine is idle before altering the transmit and receive descriptors. Add a helper for this as it will be needed again.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Tested-by: Stan Johnson userm57@yahoo.com Signed-off-by: Finn Thain fthain@telegraphics.com.au Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/net/ethernet/natsemi/sonic.c | 25 +++++++++++++++++++++++++ drivers/net/ethernet/natsemi/sonic.h | 3 +++ 2 files changed, 28 insertions(+)
--- a/drivers/net/ethernet/natsemi/sonic.c +++ b/drivers/net/ethernet/natsemi/sonic.c @@ -116,6 +116,24 @@ static int sonic_open(struct net_device return 0; }
+/* Wait for the SONIC to become idle. */ +static void sonic_quiesce(struct net_device *dev, u16 mask) +{ + struct sonic_local * __maybe_unused lp = netdev_priv(dev); + int i; + u16 bits; + + for (i = 0; i < 1000; ++i) { + bits = SONIC_READ(SONIC_CMD) & mask; + if (!bits) + return; + if (irqs_disabled() || in_interrupt()) + udelay(20); + else + usleep_range(100, 200); + } + WARN_ONCE(1, "command deadline expired! 0x%04x\n", bits); +}
/* * Close the SONIC device @@ -132,6 +150,9 @@ static int sonic_close(struct net_device /* * stop the SONIC, disable interrupts */ + SONIC_WRITE(SONIC_CMD, SONIC_CR_RXDIS); + sonic_quiesce(dev, SONIC_CR_ALL); + SONIC_WRITE(SONIC_IMR, 0); SONIC_WRITE(SONIC_ISR, 0x7fff); SONIC_WRITE(SONIC_CMD, SONIC_CR_RST); @@ -171,6 +192,9 @@ static void sonic_tx_timeout(struct net_ * put the Sonic into software-reset mode and * disable all interrupts before releasing DMA buffers */ + SONIC_WRITE(SONIC_CMD, SONIC_CR_RXDIS); + sonic_quiesce(dev, SONIC_CR_ALL); + SONIC_WRITE(SONIC_IMR, 0); SONIC_WRITE(SONIC_ISR, 0x7fff); SONIC_WRITE(SONIC_CMD, SONIC_CR_RST); @@ -658,6 +682,7 @@ static int sonic_init(struct net_device */ SONIC_WRITE(SONIC_CMD, 0); SONIC_WRITE(SONIC_CMD, SONIC_CR_RXDIS); + sonic_quiesce(dev, SONIC_CR_ALL);
/* * initialize the receive resource area --- a/drivers/net/ethernet/natsemi/sonic.h +++ b/drivers/net/ethernet/natsemi/sonic.h @@ -110,6 +110,9 @@ #define SONIC_CR_TXP 0x0002 #define SONIC_CR_HTX 0x0001
+#define SONIC_CR_ALL (SONIC_CR_LCAM | SONIC_CR_RRRA | \ + SONIC_CR_RXEN | SONIC_CR_TXP) + /* * SONIC data configuration bits */
From: Finn Thain fthain@telegraphics.com.au
commit 27e0c31c5f27c1d1a1d9d135c123069f60dcf97b upstream.
There are several issues relating to command register usage during chip initialization.
Firstly, the SONIC sometimes comes out of software reset with the Start Timer bit set. This gets logged as,
macsonic macsonic eth0: sonic_init: status=24, i=101
Avoid this by giving the Stop Timer command earlier than later.
Secondly, the loop that waits for the Read RRA command to complete has the break condition inverted. That's why the for loop iterates until its termination condition. Call the helper for this instead.
Finally, give the Receiver Enable command after clearing interrupts, not before, to avoid the possibility of losing an interrupt.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Tested-by: Stan Johnson userm57@yahoo.com Signed-off-by: Finn Thain fthain@telegraphics.com.au Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/net/ethernet/natsemi/sonic.c | 18 +++--------------- 1 file changed, 3 insertions(+), 15 deletions(-)
--- a/drivers/net/ethernet/natsemi/sonic.c +++ b/drivers/net/ethernet/natsemi/sonic.c @@ -664,7 +664,6 @@ static void sonic_multicast_list(struct */ static int sonic_init(struct net_device *dev) { - unsigned int cmd; struct sonic_local *lp = netdev_priv(dev); int i;
@@ -681,7 +680,7 @@ static int sonic_init(struct net_device * enable interrupts, then completely initialize the SONIC */ SONIC_WRITE(SONIC_CMD, 0); - SONIC_WRITE(SONIC_CMD, SONIC_CR_RXDIS); + SONIC_WRITE(SONIC_CMD, SONIC_CR_RXDIS | SONIC_CR_STP); sonic_quiesce(dev, SONIC_CR_ALL);
/* @@ -711,14 +710,7 @@ static int sonic_init(struct net_device netif_dbg(lp, ifup, dev, "%s: issuing RRRA command\n", __func__);
SONIC_WRITE(SONIC_CMD, SONIC_CR_RRRA); - i = 0; - while (i++ < 100) { - if (SONIC_READ(SONIC_CMD) & SONIC_CR_RRRA) - break; - } - - netif_dbg(lp, ifup, dev, "%s: status=%x, i=%d\n", __func__, - SONIC_READ(SONIC_CMD), i); + sonic_quiesce(dev, SONIC_CR_RRRA);
/* * Initialize the receive descriptors so that they @@ -806,15 +798,11 @@ static int sonic_init(struct net_device * enable receiver, disable loopback * and enable all interrupts */ - SONIC_WRITE(SONIC_CMD, SONIC_CR_RXEN | SONIC_CR_STP); SONIC_WRITE(SONIC_RCR, SONIC_RCR_DEFAULT); SONIC_WRITE(SONIC_TCR, SONIC_TCR_DEFAULT); SONIC_WRITE(SONIC_ISR, 0x7fff); SONIC_WRITE(SONIC_IMR, SONIC_IMR_DEFAULT); - - cmd = SONIC_READ(SONIC_CMD); - if ((cmd & SONIC_CR_RXEN) == 0 || (cmd & SONIC_CR_STP) == 0) - printk(KERN_ERR "sonic_init: failed, status=%x\n", cmd); + SONIC_WRITE(SONIC_CMD, SONIC_CR_RXEN);
netif_dbg(lp, ifup, dev, "%s: new status=%x\n", __func__, SONIC_READ(SONIC_CMD));
From: Finn Thain fthain@telegraphics.com.au
commit 772f66421d5aa0b9f256056f513bbc38ac132271 upstream.
Section 4.3.1 of the datasheet says,
This bit [TXP] must not be set if a Load CAM operation is in progress (LCAM is set). The SONIC will lock up if both bits are set simultaneously.
Testing has shown that the driver sometimes attempts to set LCAM while TXP is set. Avoid this by waiting for command completion before and after giving the LCAM command.
After issuing the Load CAM command, poll for !SONIC_CR_LCAM rather than SONIC_INT_LCD, because the SONIC_CR_TXP bit can't be used until !SONIC_CR_LCAM.
When in reset mode, take the opportunity to reset the CAM Enable register.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Tested-by: Stan Johnson userm57@yahoo.com Signed-off-by: Finn Thain fthain@telegraphics.com.au Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/net/ethernet/natsemi/sonic.c | 21 ++++++++++++--------- 1 file changed, 12 insertions(+), 9 deletions(-)
--- a/drivers/net/ethernet/natsemi/sonic.c +++ b/drivers/net/ethernet/natsemi/sonic.c @@ -634,6 +634,8 @@ static void sonic_multicast_list(struct (netdev_mc_count(dev) > 15)) { rcr |= SONIC_RCR_AMC; } else { + unsigned long flags; + netif_dbg(lp, ifup, dev, "%s: mc_count %d\n", __func__, netdev_mc_count(dev)); sonic_set_cam_enable(dev, 1); /* always enable our own address */ @@ -647,9 +649,14 @@ static void sonic_multicast_list(struct i++; } SONIC_WRITE(SONIC_CDC, 16); - /* issue Load CAM command */ SONIC_WRITE(SONIC_CDP, lp->cda_laddr & 0xffff); + + /* LCAM and TXP commands can't be used simultaneously */ + spin_lock_irqsave(&lp->lock, flags); + sonic_quiesce(dev, SONIC_CR_TXP); SONIC_WRITE(SONIC_CMD, SONIC_CR_LCAM); + sonic_quiesce(dev, SONIC_CR_LCAM); + spin_unlock_irqrestore(&lp->lock, flags); } }
@@ -675,6 +682,9 @@ static int sonic_init(struct net_device SONIC_WRITE(SONIC_ISR, 0x7fff); SONIC_WRITE(SONIC_CMD, SONIC_CR_RST);
+ /* While in reset mode, clear CAM Enable register */ + SONIC_WRITE(SONIC_CE, 0); + /* * clear software reset flag, disable receiver, clear and * enable interrupts, then completely initialize the SONIC @@ -785,14 +795,7 @@ static int sonic_init(struct net_device * load the CAM */ SONIC_WRITE(SONIC_CMD, SONIC_CR_LCAM); - - i = 0; - while (i++ < 100) { - if (SONIC_READ(SONIC_ISR) & SONIC_INT_LCD) - break; - } - netif_dbg(lp, ifup, dev, "%s: CMD=%x, ISR=%x, i=%d\n", __func__, - SONIC_READ(SONIC_CMD), SONIC_READ(SONIC_ISR), i); + sonic_quiesce(dev, SONIC_CR_LCAM);
/* * enable receiver, disable loopback
From: Finn Thain fthain@telegraphics.com.au
commit 686f85d71d095f1d26b807e23b0f0bfd22042c45 upstream.
Section 5.5.3.2 of the datasheet says,
If FIFO Underrun, Byte Count Mismatch, Excessive Collision, or Excessive Deferral (if enabled) errors occur, transmission ceases.
In this situation, the chip asserts a TXER interrupt rather than TXDN. But the handler for the TXDN is the only way that the transmit queue gets restarted. Hence, an aborted transmission can result in a watchdog timeout.
This problem can be reproduced on congested link, as that can result in excessive transmitter collisions. Another way to reproduce this is with a FIFO Underrun, which may be caused by DMA latency.
In event of a TXER interrupt, prevent a watchdog timeout by restarting transmission.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Tested-by: Stan Johnson userm57@yahoo.com Signed-off-by: Finn Thain fthain@telegraphics.com.au Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/net/ethernet/natsemi/sonic.c | 17 +++++++++++++---- 1 file changed, 13 insertions(+), 4 deletions(-)
--- a/drivers/net/ethernet/natsemi/sonic.c +++ b/drivers/net/ethernet/natsemi/sonic.c @@ -415,10 +415,19 @@ static irqreturn_t sonic_interrupt(int i lp->stats.rx_missed_errors += 65536;
/* transmit error */ - if (status & SONIC_INT_TXER) - if (SONIC_READ(SONIC_TCR) & SONIC_TCR_FU) - netif_dbg(lp, tx_err, dev, "%s: tx fifo underrun\n", - __func__); + if (status & SONIC_INT_TXER) { + u16 tcr = SONIC_READ(SONIC_TCR); + + netif_dbg(lp, tx_err, dev, "%s: TXER intr, TCR %04x\n", + __func__, tcr); + + if (tcr & (SONIC_TCR_EXD | SONIC_TCR_EXC | + SONIC_TCR_FU | SONIC_TCR_BCM)) { + /* Aborted transmission. Try again. */ + netif_stop_queue(dev); + SONIC_WRITE(SONIC_CMD, SONIC_CR_TXP); + } + }
/* bus retry */ if (status & SONIC_INT_BR) {
From: Wen Huang huangwenabc@gmail.com
commit e5e884b42639c74b5b57dc277909915c0aefc8bb upstream.
add_ie_rates() copys rates without checking the length in bss descriptor from remote AP.when victim connects to remote attacker, this may trigger buffer overflow. lbs_ibss_join_existing() copys rates without checking the length in bss descriptor from remote IBSS node.when victim connects to remote attacker, this may trigger buffer overflow. Fix them by putting the length check before performing copy.
This fix addresses CVE-2019-14896 and CVE-2019-14897. This also fix build warning of mixed declarations and code.
Reported-by: kbuild test robot lkp@intel.com Signed-off-by: Wen Huang huangwenabc@gmail.com Signed-off-by: Kalle Valo kvalo@codeaurora.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/net/wireless/marvell/libertas/cfg.c | 16 +++++++++++++--- 1 file changed, 13 insertions(+), 3 deletions(-)
--- a/drivers/net/wireless/marvell/libertas/cfg.c +++ b/drivers/net/wireless/marvell/libertas/cfg.c @@ -273,6 +273,10 @@ add_ie_rates(u8 *tlv, const u8 *ie, int int hw, ap, ap_max = ie[1]; u8 hw_rate;
+ if (ap_max > MAX_RATES) { + lbs_deb_assoc("invalid rates\n"); + return tlv; + } /* Advance past IE header */ ie += 2;
@@ -1717,6 +1721,9 @@ static int lbs_ibss_join_existing(struct struct cmd_ds_802_11_ad_hoc_join cmd; u8 preamble = RADIO_PREAMBLE_SHORT; int ret = 0; + int hw, i; + u8 rates_max; + u8 *rates;
/* TODO: set preamble based on scan result */ ret = lbs_set_radio(priv, preamble, 1); @@ -1775,9 +1782,12 @@ static int lbs_ibss_join_existing(struct if (!rates_eid) { lbs_add_rates(cmd.bss.rates); } else { - int hw, i; - u8 rates_max = rates_eid[1]; - u8 *rates = cmd.bss.rates; + rates_max = rates_eid[1]; + if (rates_max > MAX_RATES) { + lbs_deb_join("invalid rates"); + goto out; + } + rates = cmd.bss.rates; for (hw = 0; hw < ARRAY_SIZE(lbs_rates); hw++) { u8 hw_rate = lbs_rates[hw].bitrate / 5; for (i = 0; i < rates_max; i++) {
From: Hans Verkuil hverkuil-cisco@xs4all.nl
commit ee8951e56c0f960b9621636603a822811cef3158 upstream.
v4l2_vbi_format, v4l2_sliced_vbi_format and v4l2_sdr_format have a reserved array at the end that should be zeroed by drivers as per the V4L2 spec. Older drivers often do not do this, so just handle this in the core.
Signed-off-by: Hans Verkuil hverkuil-cisco@xs4all.nl Signed-off-by: Mauro Carvalho Chehab mchehab@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/media/v4l2-core/v4l2-ioctl.c | 24 ++++++++++++------------ 1 file changed, 12 insertions(+), 12 deletions(-)
--- a/drivers/media/v4l2-core/v4l2-ioctl.c +++ b/drivers/media/v4l2-core/v4l2-ioctl.c @@ -1605,12 +1605,12 @@ static int v4l_s_fmt(const struct v4l2_i case V4L2_BUF_TYPE_VBI_CAPTURE: if (unlikely(!ops->vidioc_s_fmt_vbi_cap)) break; - CLEAR_AFTER_FIELD(p, fmt.vbi); + CLEAR_AFTER_FIELD(p, fmt.vbi.flags); return ops->vidioc_s_fmt_vbi_cap(file, fh, arg); case V4L2_BUF_TYPE_SLICED_VBI_CAPTURE: if (unlikely(!ops->vidioc_s_fmt_sliced_vbi_cap)) break; - CLEAR_AFTER_FIELD(p, fmt.sliced); + CLEAR_AFTER_FIELD(p, fmt.sliced.io_size); return ops->vidioc_s_fmt_sliced_vbi_cap(file, fh, arg); case V4L2_BUF_TYPE_VIDEO_OUTPUT: if (unlikely(!ops->vidioc_s_fmt_vid_out)) @@ -1636,22 +1636,22 @@ static int v4l_s_fmt(const struct v4l2_i case V4L2_BUF_TYPE_VBI_OUTPUT: if (unlikely(!ops->vidioc_s_fmt_vbi_out)) break; - CLEAR_AFTER_FIELD(p, fmt.vbi); + CLEAR_AFTER_FIELD(p, fmt.vbi.flags); return ops->vidioc_s_fmt_vbi_out(file, fh, arg); case V4L2_BUF_TYPE_SLICED_VBI_OUTPUT: if (unlikely(!ops->vidioc_s_fmt_sliced_vbi_out)) break; - CLEAR_AFTER_FIELD(p, fmt.sliced); + CLEAR_AFTER_FIELD(p, fmt.sliced.io_size); return ops->vidioc_s_fmt_sliced_vbi_out(file, fh, arg); case V4L2_BUF_TYPE_SDR_CAPTURE: if (unlikely(!ops->vidioc_s_fmt_sdr_cap)) break; - CLEAR_AFTER_FIELD(p, fmt.sdr); + CLEAR_AFTER_FIELD(p, fmt.sdr.buffersize); return ops->vidioc_s_fmt_sdr_cap(file, fh, arg); case V4L2_BUF_TYPE_SDR_OUTPUT: if (unlikely(!ops->vidioc_s_fmt_sdr_out)) break; - CLEAR_AFTER_FIELD(p, fmt.sdr); + CLEAR_AFTER_FIELD(p, fmt.sdr.buffersize); return ops->vidioc_s_fmt_sdr_out(file, fh, arg); case V4L2_BUF_TYPE_META_CAPTURE: if (unlikely(!ops->vidioc_s_fmt_meta_cap)) @@ -1707,12 +1707,12 @@ static int v4l_try_fmt(const struct v4l2 case V4L2_BUF_TYPE_VBI_CAPTURE: if (unlikely(!ops->vidioc_try_fmt_vbi_cap)) break; - CLEAR_AFTER_FIELD(p, fmt.vbi); + CLEAR_AFTER_FIELD(p, fmt.vbi.flags); return ops->vidioc_try_fmt_vbi_cap(file, fh, arg); case V4L2_BUF_TYPE_SLICED_VBI_CAPTURE: if (unlikely(!ops->vidioc_try_fmt_sliced_vbi_cap)) break; - CLEAR_AFTER_FIELD(p, fmt.sliced); + CLEAR_AFTER_FIELD(p, fmt.sliced.io_size); return ops->vidioc_try_fmt_sliced_vbi_cap(file, fh, arg); case V4L2_BUF_TYPE_VIDEO_OUTPUT: if (unlikely(!ops->vidioc_try_fmt_vid_out)) @@ -1738,22 +1738,22 @@ static int v4l_try_fmt(const struct v4l2 case V4L2_BUF_TYPE_VBI_OUTPUT: if (unlikely(!ops->vidioc_try_fmt_vbi_out)) break; - CLEAR_AFTER_FIELD(p, fmt.vbi); + CLEAR_AFTER_FIELD(p, fmt.vbi.flags); return ops->vidioc_try_fmt_vbi_out(file, fh, arg); case V4L2_BUF_TYPE_SLICED_VBI_OUTPUT: if (unlikely(!ops->vidioc_try_fmt_sliced_vbi_out)) break; - CLEAR_AFTER_FIELD(p, fmt.sliced); + CLEAR_AFTER_FIELD(p, fmt.sliced.io_size); return ops->vidioc_try_fmt_sliced_vbi_out(file, fh, arg); case V4L2_BUF_TYPE_SDR_CAPTURE: if (unlikely(!ops->vidioc_try_fmt_sdr_cap)) break; - CLEAR_AFTER_FIELD(p, fmt.sdr); + CLEAR_AFTER_FIELD(p, fmt.sdr.buffersize); return ops->vidioc_try_fmt_sdr_cap(file, fh, arg); case V4L2_BUF_TYPE_SDR_OUTPUT: if (unlikely(!ops->vidioc_try_fmt_sdr_out)) break; - CLEAR_AFTER_FIELD(p, fmt.sdr); + CLEAR_AFTER_FIELD(p, fmt.sdr.buffersize); return ops->vidioc_try_fmt_sdr_out(file, fh, arg); case V4L2_BUF_TYPE_META_CAPTURE: if (unlikely(!ops->vidioc_try_fmt_meta_cap))
From: Kadlecsik József kadlec@blackhole.kfki.hu
commit 32c72165dbd0e246e69d16a3ad348a4851afd415 upstream.
The bitmap allocation did not use full unsigned long sizes when calculating the required size and that was triggered by KASAN as slab-out-of-bounds read in several places. The patch fixes all of them.
Reported-by: syzbot+fabca5cbf5e54f3fe2de@syzkaller.appspotmail.com Reported-by: syzbot+827ced406c9a1d9570ed@syzkaller.appspotmail.com Reported-by: syzbot+190d63957b22ef673ea5@syzkaller.appspotmail.com Reported-by: syzbot+dfccdb2bdb4a12ad425e@syzkaller.appspotmail.com Reported-by: syzbot+df0d0f5895ef1f41a65b@syzkaller.appspotmail.com Reported-by: syzbot+b08bd19bb37513357fd4@syzkaller.appspotmail.com Reported-by: syzbot+53cdd0ec0bbabd53370a@syzkaller.appspotmail.com Signed-off-by: Jozsef Kadlecsik kadlec@netfilter.org Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- include/linux/netfilter/ipset/ip_set.h | 7 ------- net/netfilter/ipset/ip_set_bitmap_gen.h | 2 +- net/netfilter/ipset/ip_set_bitmap_ip.c | 6 +++--- net/netfilter/ipset/ip_set_bitmap_ipmac.c | 6 +++--- net/netfilter/ipset/ip_set_bitmap_port.c | 6 +++--- 5 files changed, 10 insertions(+), 17 deletions(-)
--- a/include/linux/netfilter/ipset/ip_set.h +++ b/include/linux/netfilter/ipset/ip_set.h @@ -445,13 +445,6 @@ ip6addrptr(const struct sk_buff *skb, bo sizeof(*addr)); }
-/* Calculate the bytes required to store the inclusive range of a-b */ -static inline int -bitmap_bytes(u32 a, u32 b) -{ - return 4 * ((((b - a + 8) / 8) + 3) / 4); -} - /* How often should the gc be run by default */ #define IPSET_GC_TIME (3 * 60)
--- a/net/netfilter/ipset/ip_set_bitmap_gen.h +++ b/net/netfilter/ipset/ip_set_bitmap_gen.h @@ -75,7 +75,7 @@ mtype_flush(struct ip_set *set)
if (set->extensions & IPSET_EXT_DESTROY) mtype_ext_cleanup(set); - memset(map->members, 0, map->memsize); + bitmap_zero(map->members, map->elements); set->elements = 0; set->ext_size = 0; } --- a/net/netfilter/ipset/ip_set_bitmap_ip.c +++ b/net/netfilter/ipset/ip_set_bitmap_ip.c @@ -37,7 +37,7 @@ MODULE_ALIAS("ip_set_bitmap:ip");
/* Type structure */ struct bitmap_ip { - void *members; /* the set members */ + unsigned long *members; /* the set members */ u32 first_ip; /* host byte order, included in range */ u32 last_ip; /* host byte order, included in range */ u32 elements; /* number of max elements in the set */ @@ -220,7 +220,7 @@ init_map_ip(struct ip_set *set, struct b u32 first_ip, u32 last_ip, u32 elements, u32 hosts, u8 netmask) { - map->members = ip_set_alloc(map->memsize); + map->members = bitmap_zalloc(elements, GFP_KERNEL | __GFP_NOWARN); if (!map->members) return false; map->first_ip = first_ip; @@ -310,7 +310,7 @@ bitmap_ip_create(struct net *net, struct if (!map) return -ENOMEM;
- map->memsize = bitmap_bytes(0, elements - 1); + map->memsize = BITS_TO_LONGS(elements) * sizeof(unsigned long); set->variant = &bitmap_ip; if (!init_map_ip(set, map, first_ip, last_ip, elements, hosts, netmask)) { --- a/net/netfilter/ipset/ip_set_bitmap_ipmac.c +++ b/net/netfilter/ipset/ip_set_bitmap_ipmac.c @@ -42,7 +42,7 @@ enum {
/* Type structure */ struct bitmap_ipmac { - void *members; /* the set members */ + unsigned long *members; /* the set members */ u32 first_ip; /* host byte order, included in range */ u32 last_ip; /* host byte order, included in range */ u32 elements; /* number of max elements in the set */ @@ -299,7 +299,7 @@ static bool init_map_ipmac(struct ip_set *set, struct bitmap_ipmac *map, u32 first_ip, u32 last_ip, u32 elements) { - map->members = ip_set_alloc(map->memsize); + map->members = bitmap_zalloc(elements, GFP_KERNEL | __GFP_NOWARN); if (!map->members) return false; map->first_ip = first_ip; @@ -360,7 +360,7 @@ bitmap_ipmac_create(struct net *net, str if (!map) return -ENOMEM;
- map->memsize = bitmap_bytes(0, elements - 1); + map->memsize = BITS_TO_LONGS(elements) * sizeof(unsigned long); set->variant = &bitmap_ipmac; if (!init_map_ipmac(set, map, first_ip, last_ip, elements)) { kfree(map); --- a/net/netfilter/ipset/ip_set_bitmap_port.c +++ b/net/netfilter/ipset/ip_set_bitmap_port.c @@ -30,7 +30,7 @@ MODULE_ALIAS("ip_set_bitmap:port");
/* Type structure */ struct bitmap_port { - void *members; /* the set members */ + unsigned long *members; /* the set members */ u16 first_port; /* host byte order, included in range */ u16 last_port; /* host byte order, included in range */ u32 elements; /* number of max elements in the set */ @@ -204,7 +204,7 @@ static bool init_map_port(struct ip_set *set, struct bitmap_port *map, u16 first_port, u16 last_port) { - map->members = ip_set_alloc(map->memsize); + map->members = bitmap_zalloc(map->elements, GFP_KERNEL | __GFP_NOWARN); if (!map->members) return false; map->first_port = first_port; @@ -244,7 +244,7 @@ bitmap_port_create(struct net *net, stru return -ENOMEM;
map->elements = elements; - map->memsize = bitmap_bytes(0, map->elements); + map->memsize = BITS_TO_LONGS(elements) * sizeof(unsigned long); set->variant = &bitmap_port; if (!init_map_port(set, map, first_port, last_port)) { kfree(map);
From: Pablo Neira Ayuso pablo@netfilter.org
commit 826035498ec14b77b62a44f0cb6b94d45530db6f upstream.
This new helper function validates that unknown family and chain type coming from userspace do not trigger an out-of-bound array access. Bail out in case __nft_chain_type_get() returns NULL from nft_chain_parse_hook().
Fixes: 9370761c56b6 ("netfilter: nf_tables: convert built-in tables/chains to chain types") Reported-by: syzbot+156a04714799b1d480bc@syzkaller.appspotmail.com Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/netfilter/nf_tables_api.c | 29 +++++++++++++++++++++-------- 1 file changed, 21 insertions(+), 8 deletions(-)
--- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -489,14 +489,27 @@ static inline u64 nf_tables_alloc_handle static const struct nft_chain_type *chain_type[NFPROTO_NUMPROTO][NFT_CHAIN_T_MAX];
static const struct nft_chain_type * +__nft_chain_type_get(u8 family, enum nft_chain_types type) +{ + if (family >= NFPROTO_NUMPROTO || + type >= NFT_CHAIN_T_MAX) + return NULL; + + return chain_type[family][type]; +} + +static const struct nft_chain_type * __nf_tables_chain_type_lookup(const struct nlattr *nla, u8 family) { + const struct nft_chain_type *type; int i;
for (i = 0; i < NFT_CHAIN_T_MAX; i++) { - if (chain_type[family][i] != NULL && - !nla_strcmp(nla, chain_type[family][i]->name)) - return chain_type[family][i]; + type = __nft_chain_type_get(family, i); + if (!type) + continue; + if (!nla_strcmp(nla, type->name)) + return type; } return NULL; } @@ -1095,11 +1108,8 @@ static void nf_tables_table_destroy(stru
void nft_register_chain_type(const struct nft_chain_type *ctype) { - if (WARN_ON(ctype->family >= NFPROTO_NUMPROTO)) - return; - nfnl_lock(NFNL_SUBSYS_NFTABLES); - if (WARN_ON(chain_type[ctype->family][ctype->type] != NULL)) { + if (WARN_ON(__nft_chain_type_get(ctype->family, ctype->type))) { nfnl_unlock(NFNL_SUBSYS_NFTABLES); return; } @@ -1551,7 +1561,10 @@ static int nft_chain_parse_hook(struct n hook->num = ntohl(nla_get_be32(ha[NFTA_HOOK_HOOKNUM])); hook->priority = ntohl(nla_get_be32(ha[NFTA_HOOK_PRIORITY]));
- type = chain_type[family][NFT_CHAIN_T_DEFAULT]; + type = __nft_chain_type_get(family, NFT_CHAIN_T_DEFAULT); + if (!type) + return -EOPNOTSUPP; + if (nla[NFTA_CHAIN_TYPE]) { type = nf_tables_chain_type_lookup(net, nla[NFTA_CHAIN_TYPE], family, autoload);
From: Pablo Neira Ayuso pablo@netfilter.org
commit eb014de4fd418de1a277913cba244e47274fe392 upstream.
This patch introduces a list of pending module requests. This new module list is composed of nft_module_request objects that contain the module name and one status field that tells if the module has been already loaded (the 'done' field).
In the first pass, from the preparation phase, the netlink command finds that a module is missing on this list. Then, a module request is allocated and added to this list and nft_request_module() returns -EAGAIN. This triggers the abort path with the autoload parameter set on from nfnetlink, request_module() is called and the module request enters the 'done' state. Since the mutex is released when loading modules from the abort phase, the module list is zapped so this is iteration occurs over a local list. Therefore, the request_module() calls happen when object lists are in consistent state (after fulling aborting the transaction) and the commit list is empty.
On the second pass, the netlink command will find that it already tried to load the module, so it does not request it again and nft_request_module() returns 0. Then, there is a look up to find the object that the command was missing. If the module was successfully loaded, the command proceeds normally since it finds the missing object in place, otherwise -ENOENT is reported to userspace.
This patch also updates nfnetlink to include the reason to enter the abort phase, which is required for this new autoload module rationale.
Fixes: ec7470b834fe ("netfilter: nf_tables: store transaction list locally while requesting module") Reported-by: syzbot+29125d208b3dae9a7019@syzkaller.appspotmail.com Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- include/linux/netfilter/nfnetlink.h | 2 include/net/netns/nftables.h | 1 net/netfilter/nf_tables_api.c | 126 ++++++++++++++++++++++++------------ net/netfilter/nfnetlink.c | 6 - 4 files changed, 91 insertions(+), 44 deletions(-)
--- a/include/linux/netfilter/nfnetlink.h +++ b/include/linux/netfilter/nfnetlink.h @@ -31,7 +31,7 @@ struct nfnetlink_subsystem { const struct nfnl_callback *cb; /* callback for individual types */ struct module *owner; int (*commit)(struct net *net, struct sk_buff *skb); - int (*abort)(struct net *net, struct sk_buff *skb); + int (*abort)(struct net *net, struct sk_buff *skb, bool autoload); void (*cleanup)(struct net *net); bool (*valid_genid)(struct net *net, u32 genid); }; --- a/include/net/netns/nftables.h +++ b/include/net/netns/nftables.h @@ -7,6 +7,7 @@ struct netns_nftables { struct list_head tables; struct list_head commit_list; + struct list_head module_list; struct mutex commit_mutex; unsigned int base_seq; u8 gencursor; --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -514,35 +514,45 @@ __nf_tables_chain_type_lookup(const stru return NULL; }
-/* - * Loading a module requires dropping mutex that guards the transaction. - * A different client might race to start a new transaction meanwhile. Zap the - * list of pending transaction and then restore it once the mutex is grabbed - * again. Users of this function return EAGAIN which implicitly triggers the - * transaction abort path to clean up the list of pending transactions. - */ +struct nft_module_request { + struct list_head list; + char module[MODULE_NAME_LEN]; + bool done; +}; + #ifdef CONFIG_MODULES -static void nft_request_module(struct net *net, const char *fmt, ...) +static int nft_request_module(struct net *net, const char *fmt, ...) { char module_name[MODULE_NAME_LEN]; - LIST_HEAD(commit_list); + struct nft_module_request *req; va_list args; int ret;
- list_splice_init(&net->nft.commit_list, &commit_list); - va_start(args, fmt); ret = vsnprintf(module_name, MODULE_NAME_LEN, fmt, args); va_end(args); if (ret >= MODULE_NAME_LEN) - return; + return 0;
- mutex_unlock(&net->nft.commit_mutex); - request_module("%s", module_name); - mutex_lock(&net->nft.commit_mutex); + list_for_each_entry(req, &net->nft.module_list, list) { + if (!strcmp(req->module, module_name)) { + if (req->done) + return 0;
- WARN_ON_ONCE(!list_empty(&net->nft.commit_list)); - list_splice(&commit_list, &net->nft.commit_list); + /* A request to load this module already exists. */ + return -EAGAIN; + } + } + + req = kmalloc(sizeof(*req), GFP_KERNEL); + if (!req) + return -ENOMEM; + + req->done = false; + strlcpy(req->module, module_name, MODULE_NAME_LEN); + list_add_tail(&req->list, &net->nft.module_list); + + return -EAGAIN; } #endif
@@ -566,10 +576,9 @@ nf_tables_chain_type_lookup(struct net * lockdep_nfnl_nft_mutex_not_held(); #ifdef CONFIG_MODULES if (autoload) { - nft_request_module(net, "nft-chain-%u-%.*s", family, - nla_len(nla), (const char *)nla_data(nla)); - type = __nf_tables_chain_type_lookup(nla, family); - if (type != NULL) + if (nft_request_module(net, "nft-chain-%u-%.*s", family, + nla_len(nla), + (const char *)nla_data(nla)) == -EAGAIN) return ERR_PTR(-EAGAIN); } #endif @@ -2073,9 +2082,8 @@ static const struct nft_expr_type *__nft static int nft_expr_type_request_module(struct net *net, u8 family, struct nlattr *nla) { - nft_request_module(net, "nft-expr-%u-%.*s", family, - nla_len(nla), (char *)nla_data(nla)); - if (__nft_expr_type_get(family, nla)) + if (nft_request_module(net, "nft-expr-%u-%.*s", family, + nla_len(nla), (char *)nla_data(nla)) == -EAGAIN) return -EAGAIN;
return 0; @@ -2101,9 +2109,9 @@ static const struct nft_expr_type *nft_e if (nft_expr_type_request_module(net, family, nla) == -EAGAIN) return ERR_PTR(-EAGAIN);
- nft_request_module(net, "nft-expr-%.*s", - nla_len(nla), (char *)nla_data(nla)); - if (__nft_expr_type_get(family, nla)) + if (nft_request_module(net, "nft-expr-%.*s", + nla_len(nla), + (char *)nla_data(nla)) == -EAGAIN) return ERR_PTR(-EAGAIN); } #endif @@ -2194,9 +2202,10 @@ static int nf_tables_expr_parse(const st err = PTR_ERR(ops); #ifdef CONFIG_MODULES if (err == -EAGAIN) - nft_expr_type_request_module(ctx->net, - ctx->family, - tb[NFTA_EXPR_NAME]); + if (nft_expr_type_request_module(ctx->net, + ctx->family, + tb[NFTA_EXPR_NAME]) != -EAGAIN) + err = -ENOENT; #endif goto err1; } @@ -3033,8 +3042,7 @@ nft_select_set_ops(const struct nft_ctx lockdep_nfnl_nft_mutex_not_held(); #ifdef CONFIG_MODULES if (list_empty(&nf_tables_set_types)) { - nft_request_module(ctx->net, "nft-set"); - if (!list_empty(&nf_tables_set_types)) + if (nft_request_module(ctx->net, "nft-set") == -EAGAIN) return ERR_PTR(-EAGAIN); } #endif @@ -5160,8 +5168,7 @@ nft_obj_type_get(struct net *net, u32 ob lockdep_nfnl_nft_mutex_not_held(); #ifdef CONFIG_MODULES if (type == NULL) { - nft_request_module(net, "nft-obj-%u", objtype); - if (__nft_obj_type_get(objtype)) + if (nft_request_module(net, "nft-obj-%u", objtype) == -EAGAIN) return ERR_PTR(-EAGAIN); } #endif @@ -5777,8 +5784,7 @@ nft_flowtable_type_get(struct net *net, lockdep_nfnl_nft_mutex_not_held(); #ifdef CONFIG_MODULES if (type == NULL) { - nft_request_module(net, "nf-flowtable-%u", family); - if (__nft_flowtable_type_get(family)) + if (nft_request_module(net, "nf-flowtable-%u", family) == -EAGAIN) return ERR_PTR(-EAGAIN); } #endif @@ -6725,6 +6731,18 @@ static void nft_chain_del(struct nft_cha list_del_rcu(&chain->list); }
+static void nf_tables_module_autoload_cleanup(struct net *net) +{ + struct nft_module_request *req, *next; + + WARN_ON_ONCE(!list_empty(&net->nft.commit_list)); + list_for_each_entry_safe(req, next, &net->nft.module_list, list) { + WARN_ON_ONCE(!req->done); + list_del(&req->list); + kfree(req); + } +} + static void nf_tables_commit_release(struct net *net) { struct nft_trans *trans; @@ -6737,6 +6755,7 @@ static void nf_tables_commit_release(str * to prevent expensive synchronize_rcu() in commit phase. */ if (list_empty(&net->nft.commit_list)) { + nf_tables_module_autoload_cleanup(net); mutex_unlock(&net->nft.commit_mutex); return; } @@ -6751,6 +6770,7 @@ static void nf_tables_commit_release(str list_splice_tail_init(&net->nft.commit_list, &nf_tables_destroy_list); spin_unlock(&nf_tables_destroy_list_lock);
+ nf_tables_module_autoload_cleanup(net); mutex_unlock(&net->nft.commit_mutex);
schedule_work(&trans_destroy_work); @@ -6942,6 +6962,26 @@ static int nf_tables_commit(struct net * return 0; }
+static void nf_tables_module_autoload(struct net *net) +{ + struct nft_module_request *req, *next; + LIST_HEAD(module_list); + + list_splice_init(&net->nft.module_list, &module_list); + mutex_unlock(&net->nft.commit_mutex); + list_for_each_entry_safe(req, next, &module_list, list) { + if (req->done) { + list_del(&req->list); + kfree(req); + } else { + request_module("%s", req->module); + req->done = true; + } + } + mutex_lock(&net->nft.commit_mutex); + list_splice(&module_list, &net->nft.module_list); +} + static void nf_tables_abort_release(struct nft_trans *trans) { switch (trans->msg_type) { @@ -6971,7 +7011,7 @@ static void nf_tables_abort_release(stru kfree(trans); }
-static int __nf_tables_abort(struct net *net) +static int __nf_tables_abort(struct net *net, bool autoload) { struct nft_trans *trans, *next; struct nft_trans_elem *te; @@ -7093,6 +7133,11 @@ static int __nf_tables_abort(struct net nf_tables_abort_release(trans); }
+ if (autoload) + nf_tables_module_autoload(net); + else + nf_tables_module_autoload_cleanup(net); + return 0; }
@@ -7101,9 +7146,9 @@ static void nf_tables_cleanup(struct net nft_validate_state_update(net, NFT_VALIDATE_SKIP); }
-static int nf_tables_abort(struct net *net, struct sk_buff *skb) +static int nf_tables_abort(struct net *net, struct sk_buff *skb, bool autoload) { - int ret = __nf_tables_abort(net); + int ret = __nf_tables_abort(net, autoload);
mutex_unlock(&net->nft.commit_mutex);
@@ -7698,6 +7743,7 @@ static int __net_init nf_tables_init_net { INIT_LIST_HEAD(&net->nft.tables); INIT_LIST_HEAD(&net->nft.commit_list); + INIT_LIST_HEAD(&net->nft.module_list); mutex_init(&net->nft.commit_mutex); net->nft.base_seq = 1; net->nft.validate_state = NFT_VALIDATE_SKIP; @@ -7709,7 +7755,7 @@ static void __net_exit nf_tables_exit_ne { mutex_lock(&net->nft.commit_mutex); if (!list_empty(&net->nft.commit_list)) - __nf_tables_abort(net); + __nf_tables_abort(net, false); __nft_release_tables(net); mutex_unlock(&net->nft.commit_mutex); WARN_ON_ONCE(!list_empty(&net->nft.tables)); --- a/net/netfilter/nfnetlink.c +++ b/net/netfilter/nfnetlink.c @@ -476,7 +476,7 @@ ack: } done: if (status & NFNL_BATCH_REPLAY) { - ss->abort(net, oskb); + ss->abort(net, oskb, true); nfnl_err_reset(&err_list); kfree_skb(skb); module_put(ss->owner); @@ -487,11 +487,11 @@ done: status |= NFNL_BATCH_REPLAY; goto done; } else if (err) { - ss->abort(net, oskb); + ss->abort(net, oskb, false); netlink_ack(oskb, nlmsg_hdr(oskb), err, NULL); } } else { - ss->abort(net, oskb); + ss->abort(net, oskb, false); } if (ss->cleanup) ss->cleanup(net);
From: Martin Schiller ms@dev.tdt.de
commit e21dba7a4df4d93da237da65a096084b4f2e87b4 upstream.
This patch fixes 2 issues in x25_connect():
1. It makes absolutely no sense to reset the neighbour and the connection state after a (successful) nonblocking call of x25_connect. This prevents any connection from being established, since the response (call accept) cannot be processed.
2. Any further calls to x25_connect() while a call is pending should simply return, instead of creating new Call Request (on different logical channels).
This patch should also fix the "KASAN: null-ptr-deref Write in x25_connect" and "BUG: unable to handle kernel NULL pointer dereference in x25_connect" bugs reported by syzbot.
Signed-off-by: Martin Schiller ms@dev.tdt.de Reported-by: syzbot+429c200ffc8772bfe070@syzkaller.appspotmail.com Reported-by: syzbot+eec0c87f31a7c3b66f7b@syzkaller.appspotmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/x25/af_x25.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
--- a/net/x25/af_x25.c +++ b/net/x25/af_x25.c @@ -760,6 +760,10 @@ static int x25_connect(struct socket *so if (sk->sk_state == TCP_ESTABLISHED) goto out;
+ rc = -EALREADY; /* Do nothing if call is already in progress */ + if (sk->sk_state == TCP_SYN_SENT) + goto out; + sk->sk_state = TCP_CLOSE; sock->state = SS_UNCONNECTED;
@@ -806,7 +810,7 @@ static int x25_connect(struct socket *so /* Now the loop */ rc = -EINPROGRESS; if (sk->sk_state != TCP_ESTABLISHED && (flags & O_NONBLOCK)) - goto out_put_neigh; + goto out;
rc = x25_wait_for_connection_establishment(sk); if (rc)
On 1/28/20 6:59 AM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.4.16 release. There are 104 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 30 Jan 2020 13:57:09 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.4.16-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.4.y and the diffstat can be found below.
thanks,
greg k-h
Compiled and booted on my test system. No dmesg regressions.
thanks, -- Shuah
On Tue, Jan 28, 2020 at 04:00:50PM -0700, shuah wrote:
On 1/28/20 6:59 AM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.4.16 release. There are 104 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 30 Jan 2020 13:57:09 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.4.16-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.4.y and the diffstat can be found below.
thanks,
greg k-h
Compiled and booted on my test system. No dmesg regressions.
Great, thanks for testing all of these and letting me know.
greg k-h
On Tue, 28 Jan 2020 at 19:31, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 5.4.16 release. There are 104 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 30 Jan 2020 13:57:09 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.4.16-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.4.y and the diffstat can be found below.
thanks,
greg k-h
Results from Linaro’s test farm. No regressions on arm64, arm, x86_64, and i386.
Summary ------------------------------------------------------------------------
kernel: 5.4.16-rc1 git repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git git branch: linux-5.4.y git commit: 4acf9f18a8febb1cd7bd9c284ee494fdeb40ad96 git describe: v5.4.15-105-g4acf9f18a8fe Test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-5.4-oe/build/v5.4.15-105-...
No regressions (compared to build v5.4.15)
No fixes (compared to build v5.4.15)
Ran 23853 total tests in the following environments and test suites.
Environments -------------- - dragonboard-410c - hi6220-hikey - i386 - juno-r2 - qemu_arm - qemu_arm64 - qemu_i386 - qemu_x86_64 - x15 - x86
Test Suites ----------- * build * install-android-platform-tools-r2600 * kselftest * libgpiod * libhugetlbfs * linux-log-parser * ltp-cap_bounds-tests * ltp-commands-tests * ltp-containers-tests * ltp-cpuhotplug-tests * ltp-cve-tests * ltp-dio-tests * ltp-fcntl-locktests-tests * ltp-filecaps-tests * ltp-fs-tests * ltp-fs_bind-tests * ltp-fs_perms_simple-tests * ltp-fsx-tests * ltp-io-tests * ltp-ipc-tests * ltp-math-tests * ltp-nptl-tests * ltp-pty-tests * ltp-sched-tests * ltp-securebits-tests * ltp-syscalls-tests * network-basic-tests * perf * spectre-meltdown-checker-test * v4l2-compliance * ltp-hugetlb-tests * ltp-mm-tests * ltp-open-posix-tests * kvm-unit-tests * kselftest-vsyscall-mode-native * kselftest-vsyscall-mode-none * ssuite
On Wed, Jan 29, 2020 at 10:27:07AM +0530, Naresh Kamboju wrote:
On Tue, 28 Jan 2020 at 19:31, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 5.4.16 release. There are 104 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 30 Jan 2020 13:57:09 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.4.16-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.4.y and the diffstat can be found below.
thanks,
greg k-h
Results from Linaro’s test farm. No regressions on arm64, arm, x86_64, and i386.
Thanks for testing all of these and letting me know.
greg k-h
On 28/01/2020 13:59, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.4.16 release. There are 104 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 30 Jan 2020 13:57:09 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.4.16-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.4.y and the diffstat can be found below.
thanks,
greg k-h
All tests are passing for Tegra ...
Test results for stable-v5.4: 13 builds: 13 pass, 0 fail 22 boots: 22 pass, 0 fail 40 tests: 40 pass, 0 fail
Linux version: 5.4.16-rc1-g4acf9f18a8fe Boards tested: tegra124-jetson-tk1, tegra186-p2771-0000, tegra194-p2972-0000, tegra20-ventana, tegra210-p2371-2180, tegra210-p3450-0000, tegra30-cardhu-a04
Cheers Jon
On Wed, Jan 29, 2020 at 01:16:44PM +0000, Jon Hunter wrote:
On 28/01/2020 13:59, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.4.16 release. There are 104 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 30 Jan 2020 13:57:09 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.4.16-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.4.y and the diffstat can be found below.
thanks,
greg k-h
All tests are passing for Tegra ...
Test results for stable-v5.4: 13 builds: 13 pass, 0 fail 22 boots: 22 pass, 0 fail 40 tests: 40 pass, 0 fail
Linux version: 5.4.16-rc1-g4acf9f18a8fe Boards tested: tegra124-jetson-tk1, tegra186-p2771-0000, tegra194-p2972-0000, tegra20-ventana, tegra210-p2371-2180, tegra210-p3450-0000, tegra30-cardhu-a04
Thanks for testing all of these and letting me know.
greg k-h
On Tue, Jan 28, 2020 at 02:59:21PM +0100, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.4.16 release. There are 104 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 30 Jan 2020 13:57:09 +0000. Anything received after that time might be too late.
Build results: total: 158 pass: 158 fail: 0 Qemu test results: total: 388 pass: 388 fail: 0
Guenter
On Wed, Jan 29, 2020 at 06:43:56AM -0800, Guenter Roeck wrote:
On Tue, Jan 28, 2020 at 02:59:21PM +0100, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.4.16 release. There are 104 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 30 Jan 2020 13:57:09 +0000. Anything received after that time might be too late.
Build results: total: 158 pass: 158 fail: 0 Qemu test results: total: 388 pass: 388 fail: 0
Wonderful, thanks for testing all of these and letting me know.
greg k-h
linux-stable-mirror@lists.linaro.org