This is the start of the stable review cycle for the 4.19.255 release. There are 32 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 11 Aug 2022 17:55:02 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.19.255-rc... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.19.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 4.19.255-rc1
Pawan Gupta pawan.kumar.gupta@linux.intel.com x86/speculation: Add LFENCE to RSB fill sequence
Daniel Sneddon daniel.sneddon@linux.intel.com x86/speculation: Add RSB VM Exit protections
Ning Qiang sohu0106@126.com macintosh/adb: fix oob read in do_adb_query() function
Werner Sembach wse@tuxedocomputers.com ACPI: video: Shortening quirk list by identifying Clevo by board_name only
Werner Sembach wse@tuxedocomputers.com ACPI: video: Force backlight native for some TongFang devices
Ming Lei ming.lei@redhat.com scsi: core: Fix race between handling STS_RESOURCE and completion
Wei Mingzhi whistler@member.fsf.org mt7601u: add USB device ID for some versions of XiaoDu WiFi Dongle.
Greg Kroah-Hartman gregkh@linuxfoundation.org ARM: crypto: comment out gcc warning that breaks clang builds
Leo Yan leo.yan@linaro.org perf symbol: Correct address for bss symbols
Florian Westphal fw@strlen.de netfilter: nf_queue: do not allow packet truncation below transport header offset
Duoming Zhou duoming@zju.edu.cn sctp: fix sleep in atomic context bug in timer handlers
Michal Maloszewski michal.maloszewski@intel.com i40e: Fix interface init with MSI interrupts (no MSI-X)
Kuniyuki Iwashima kuniyu@amazon.com tcp: Fix a data-race around sysctl_tcp_comp_sack_nr.
Kuniyuki Iwashima kuniyu@amazon.com tcp: Fix a data-race around sysctl_tcp_comp_sack_delay_ns.
Xin Long lucien.xin@gmail.com Documentation: fix sctp_wmem in ip-sysctl.rst
Kuniyuki Iwashima kuniyu@amazon.com tcp: Fix a data-race around sysctl_tcp_invalid_ratelimit.
Kuniyuki Iwashima kuniyu@amazon.com tcp: Fix a data-race around sysctl_tcp_autocorking.
Kuniyuki Iwashima kuniyu@amazon.com tcp: Fix a data-race around sysctl_tcp_min_rtt_wlen.
Kuniyuki Iwashima kuniyu@amazon.com tcp: Fix a data-race around sysctl_tcp_min_tso_segs.
Liang He windhl@126.com net: sungem_phy: Add of_node_put() for reference returned by of_get_parent()
Kuniyuki Iwashima kuniyu@amazon.com igmp: Fix data-races around sysctl_igmp_qrv.
Kuniyuki Iwashima kuniyu@amazon.com net: ping6: Fix memleak in ipv6_renew_options().
Kuniyuki Iwashima kuniyu@amazon.com tcp: Fix a data-race around sysctl_tcp_challenge_ack_limit.
Liang He windhl@126.com scsi: ufs: host: Hold reference returned by of_parse_phandle()
Kuniyuki Iwashima kuniyu@amazon.com tcp: Fix a data-race around sysctl_tcp_nometrics_save.
Kuniyuki Iwashima kuniyu@amazon.com tcp: Fix a data-race around sysctl_tcp_frto.
Kuniyuki Iwashima kuniyu@amazon.com tcp: Fix a data-race around sysctl_tcp_adv_win_scale.
Kuniyuki Iwashima kuniyu@amazon.com tcp: Fix a data-race around sysctl_tcp_app_win.
Kuniyuki Iwashima kuniyu@amazon.com tcp: Fix data-races around sysctl_tcp_dsack.
Harald Freudenberger freude@linux.ibm.com s390/archrandom: prevent CPACF trng invocations in interrupt context
ChenXiaoSong chenxiaosong2@huawei.com ntfs: fix use-after-free in ntfs_ucsncmp()
Luiz Augusto von Dentz luiz.von.dentz@intel.com Bluetooth: L2CAP: Fix use-after-free caused by l2cap_chan_put
-------------
Diffstat:
Documentation/admin-guide/hw-vuln/spectre.rst | 8 ++++ Documentation/networking/ip-sysctl.txt | 9 +++- Makefile | 4 +- arch/arm/lib/xor-neon.c | 3 +- arch/s390/include/asm/archrandom.h | 9 ++-- arch/x86/include/asm/cpufeatures.h | 2 + arch/x86/include/asm/msr-index.h | 4 ++ arch/x86/include/asm/nospec-branch.h | 19 ++++++++- arch/x86/kernel/cpu/bugs.c | 61 ++++++++++++++++++++++++++- arch/x86/kernel/cpu/common.c | 12 +++++- arch/x86/kvm/vmx.c | 6 +-- drivers/acpi/video_detect.c | 55 +++++++++++++++--------- drivers/macintosh/adb.c | 2 +- drivers/net/ethernet/intel/i40e/i40e_main.c | 4 ++ drivers/net/sungem_phy.c | 1 + drivers/net/wireless/mediatek/mt7601u/usb.c | 1 + drivers/scsi/scsi_lib.c | 3 +- drivers/scsi/ufs/ufshcd-pltfrm.c | 15 ++++++- fs/ntfs/attrib.c | 8 +++- include/net/bluetooth/l2cap.h | 1 + include/net/tcp.h | 2 +- net/bluetooth/l2cap_core.c | 61 +++++++++++++++++++++------ net/ipv4/igmp.c | 24 ++++++----- net/ipv4/tcp.c | 2 +- net/ipv4/tcp_input.c | 20 +++++---- net/ipv4/tcp_metrics.c | 2 +- net/ipv4/tcp_output.c | 2 +- net/ipv6/ping.c | 6 +++ net/netfilter/nfnetlink_queue.c | 7 ++- net/sctp/stream_sched.c | 2 +- tools/perf/util/symbol-elf.c | 45 ++++++++++++++++++-- 31 files changed, 316 insertions(+), 84 deletions(-)
From: Luiz Augusto von Dentz luiz.von.dentz@intel.com
commit d0be8347c623e0ac4202a1d4e0373882821f56b0 upstream.
This fixes the following trace which is caused by hci_rx_work starting up *after* the final channel reference has been put() during sock_close() but *before* the references to the channel have been destroyed, so instead the code now rely on kref_get_unless_zero/l2cap_chan_hold_unless_zero to prevent referencing a channel that is about to be destroyed.
refcount_t: increment on 0; use-after-free. BUG: KASAN: use-after-free in refcount_dec_and_test+0x20/0xd0 Read of size 4 at addr ffffffc114f5bf18 by task kworker/u17:14/705
CPU: 4 PID: 705 Comm: kworker/u17:14 Tainted: G S W 4.14.234-00003-g1fb6d0bd49a4-dirty #28 Hardware name: Qualcomm Technologies, Inc. SM8150 V2 PM8150 Google Inc. MSM sm8150 Flame DVT (DT) Workqueue: hci0 hci_rx_work Call trace: dump_backtrace+0x0/0x378 show_stack+0x20/0x2c dump_stack+0x124/0x148 print_address_description+0x80/0x2e8 __kasan_report+0x168/0x188 kasan_report+0x10/0x18 __asan_load4+0x84/0x8c refcount_dec_and_test+0x20/0xd0 l2cap_chan_put+0x48/0x12c l2cap_recv_frame+0x4770/0x6550 l2cap_recv_acldata+0x44c/0x7a4 hci_acldata_packet+0x100/0x188 hci_rx_work+0x178/0x23c process_one_work+0x35c/0x95c worker_thread+0x4cc/0x960 kthread+0x1a8/0x1c4 ret_from_fork+0x10/0x18
Cc: stable@kernel.org Reported-by: Lee Jones lee.jones@linaro.org Signed-off-by: Luiz Augusto von Dentz luiz.von.dentz@intel.com Tested-by: Lee Jones lee.jones@linaro.org Signed-off-by: Luiz Augusto von Dentz luiz.von.dentz@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/net/bluetooth/l2cap.h | 1 net/bluetooth/l2cap_core.c | 61 +++++++++++++++++++++++++++++++++--------- 2 files changed, 49 insertions(+), 13 deletions(-)
--- a/include/net/bluetooth/l2cap.h +++ b/include/net/bluetooth/l2cap.h @@ -798,6 +798,7 @@ enum { };
void l2cap_chan_hold(struct l2cap_chan *c); +struct l2cap_chan *l2cap_chan_hold_unless_zero(struct l2cap_chan *c); void l2cap_chan_put(struct l2cap_chan *c);
static inline void l2cap_chan_lock(struct l2cap_chan *chan) --- a/net/bluetooth/l2cap_core.c +++ b/net/bluetooth/l2cap_core.c @@ -113,7 +113,8 @@ static struct l2cap_chan *__l2cap_get_ch }
/* Find channel with given SCID. - * Returns locked channel. */ + * Returns a reference locked channel. + */ static struct l2cap_chan *l2cap_get_chan_by_scid(struct l2cap_conn *conn, u16 cid) { @@ -121,15 +122,19 @@ static struct l2cap_chan *l2cap_get_chan
mutex_lock(&conn->chan_lock); c = __l2cap_get_chan_by_scid(conn, cid); - if (c) - l2cap_chan_lock(c); + if (c) { + /* Only lock if chan reference is not 0 */ + c = l2cap_chan_hold_unless_zero(c); + if (c) + l2cap_chan_lock(c); + } mutex_unlock(&conn->chan_lock);
return c; }
/* Find channel with given DCID. - * Returns locked channel. + * Returns a reference locked channel. */ static struct l2cap_chan *l2cap_get_chan_by_dcid(struct l2cap_conn *conn, u16 cid) @@ -138,8 +143,12 @@ static struct l2cap_chan *l2cap_get_chan
mutex_lock(&conn->chan_lock); c = __l2cap_get_chan_by_dcid(conn, cid); - if (c) - l2cap_chan_lock(c); + if (c) { + /* Only lock if chan reference is not 0 */ + c = l2cap_chan_hold_unless_zero(c); + if (c) + l2cap_chan_lock(c); + } mutex_unlock(&conn->chan_lock);
return c; @@ -164,8 +173,12 @@ static struct l2cap_chan *l2cap_get_chan
mutex_lock(&conn->chan_lock); c = __l2cap_get_chan_by_ident(conn, ident); - if (c) - l2cap_chan_lock(c); + if (c) { + /* Only lock if chan reference is not 0 */ + c = l2cap_chan_hold_unless_zero(c); + if (c) + l2cap_chan_lock(c); + } mutex_unlock(&conn->chan_lock);
return c; @@ -491,6 +504,16 @@ void l2cap_chan_hold(struct l2cap_chan * kref_get(&c->kref); }
+struct l2cap_chan *l2cap_chan_hold_unless_zero(struct l2cap_chan *c) +{ + BT_DBG("chan %p orig refcnt %u", c, kref_read(&c->kref)); + + if (!kref_get_unless_zero(&c->kref)) + return NULL; + + return c; +} + void l2cap_chan_put(struct l2cap_chan *c) { BT_DBG("chan %p orig refcnt %d", c, kref_read(&c->kref)); @@ -1803,7 +1826,10 @@ static struct l2cap_chan *l2cap_global_c src_match = !bacmp(&c->src, src); dst_match = !bacmp(&c->dst, dst); if (src_match && dst_match) { - l2cap_chan_hold(c); + c = l2cap_chan_hold_unless_zero(c); + if (!c) + continue; + read_unlock(&chan_list_lock); return c; } @@ -1818,7 +1844,7 @@ static struct l2cap_chan *l2cap_global_c }
if (c1) - l2cap_chan_hold(c1); + c1 = l2cap_chan_hold_unless_zero(c1);
read_unlock(&chan_list_lock);
@@ -4204,6 +4230,7 @@ static inline int l2cap_config_req(struc
unlock: l2cap_chan_unlock(chan); + l2cap_chan_put(chan); return err; }
@@ -4316,6 +4343,7 @@ static inline int l2cap_config_rsp(struc
done: l2cap_chan_unlock(chan); + l2cap_chan_put(chan); return err; }
@@ -5044,6 +5072,7 @@ send_move_response: l2cap_send_move_chan_rsp(chan, result);
l2cap_chan_unlock(chan); + l2cap_chan_put(chan);
return 0; } @@ -5136,6 +5165,7 @@ static void l2cap_move_continue(struct l }
l2cap_chan_unlock(chan); + l2cap_chan_put(chan); }
static void l2cap_move_fail(struct l2cap_conn *conn, u8 ident, u16 icid, @@ -5165,6 +5195,7 @@ static void l2cap_move_fail(struct l2cap l2cap_send_move_chan_cfm(chan, L2CAP_MC_UNCONFIRMED);
l2cap_chan_unlock(chan); + l2cap_chan_put(chan); }
static int l2cap_move_channel_rsp(struct l2cap_conn *conn, @@ -5228,6 +5259,7 @@ static int l2cap_move_channel_confirm(st l2cap_send_move_chan_cfm_rsp(conn, cmd->ident, icid);
l2cap_chan_unlock(chan); + l2cap_chan_put(chan);
return 0; } @@ -5263,6 +5295,7 @@ static inline int l2cap_move_channel_con }
l2cap_chan_unlock(chan); + l2cap_chan_put(chan);
return 0; } @@ -5635,12 +5668,11 @@ static inline int l2cap_le_credits(struc if (credits > max_credits) { BT_ERR("LE credits overflow"); l2cap_send_disconn_req(chan, ECONNRESET); - l2cap_chan_unlock(chan);
/* Return 0 so that we don't trigger an unnecessary * command reject packet. */ - return 0; + goto unlock; }
chan->tx_credits += credits; @@ -5651,7 +5683,9 @@ static inline int l2cap_le_credits(struc if (chan->tx_credits) chan->ops->resume(chan);
+unlock: l2cap_chan_unlock(chan); + l2cap_chan_put(chan);
return 0; } @@ -6949,6 +6983,7 @@ drop:
done: l2cap_chan_unlock(chan); + l2cap_chan_put(chan); }
static void l2cap_conless_channel(struct l2cap_conn *conn, __le16 psm, @@ -7353,7 +7388,7 @@ static struct l2cap_chan *l2cap_global_f if (src_type != c->src_type) continue;
- l2cap_chan_hold(c); + c = l2cap_chan_hold_unless_zero(c); read_unlock(&chan_list_lock); return c; }
From: ChenXiaoSong chenxiaosong2@huawei.com
commit 38c9c22a85aeed28d0831f230136e9cf6fa2ed44 upstream.
Syzkaller reported use-after-free bug as follows:
================================================================== BUG: KASAN: use-after-free in ntfs_ucsncmp+0x123/0x130 Read of size 2 at addr ffff8880751acee8 by task a.out/879
CPU: 7 PID: 879 Comm: a.out Not tainted 5.19.0-rc4-next-20220630-00001-gcc5218c8bd2c-dirty #7 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x1c0/0x2b0 print_address_description.constprop.0.cold+0xd4/0x484 print_report.cold+0x55/0x232 kasan_report+0xbf/0xf0 ntfs_ucsncmp+0x123/0x130 ntfs_are_names_equal.cold+0x2b/0x41 ntfs_attr_find+0x43b/0xb90 ntfs_attr_lookup+0x16d/0x1e0 ntfs_read_locked_attr_inode+0x4aa/0x2360 ntfs_attr_iget+0x1af/0x220 ntfs_read_locked_inode+0x246c/0x5120 ntfs_iget+0x132/0x180 load_system_files+0x1cc6/0x3480 ntfs_fill_super+0xa66/0x1cf0 mount_bdev+0x38d/0x460 legacy_get_tree+0x10d/0x220 vfs_get_tree+0x93/0x300 do_new_mount+0x2da/0x6d0 path_mount+0x496/0x19d0 __x64_sys_mount+0x284/0x300 do_syscall_64+0x3b/0xc0 entry_SYSCALL_64_after_hwframe+0x46/0xb0 RIP: 0033:0x7f3f2118d9ea Code: 48 8b 0d a9 f4 0b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 76 f4 0b 00 f7 d8 64 89 01 48 RSP: 002b:00007ffc269deac8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f3f2118d9ea RDX: 0000000020000000 RSI: 0000000020000100 RDI: 00007ffc269dec00 RBP: 00007ffc269dec80 R08: 00007ffc269deb00 R09: 00007ffc269dec44 R10: 0000000000000000 R11: 0000000000000202 R12: 000055f81ab1d220 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 </TASK>
The buggy address belongs to the physical page: page:0000000085430378 refcount:1 mapcount:1 mapping:0000000000000000 index:0x555c6a81d pfn:0x751ac memcg:ffff888101f7e180 anon flags: 0xfffffc00a0014(uptodate|lru|mappedtodisk|swapbacked|node=0|zone=1|lastcpupid=0x1fffff) raw: 000fffffc00a0014 ffffea0001bf2988 ffffea0001de2448 ffff88801712e201 raw: 0000000555c6a81d 0000000000000000 0000000100000000 ffff888101f7e180 page dumped because: kasan: bad access detected
Memory state around the buggy address: ffff8880751acd80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff8880751ace00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff8880751ace80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
^ ffff8880751acf00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff8880751acf80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ==================================================================
The reason is that struct ATTR_RECORD->name_offset is 6485, end address of name string is out of bounds.
Fix this by adding sanity check on end address of attribute name string.
[akpm@linux-foundation.org: coding-style cleanups] [chenxiaosong2@huawei.com: cleanup suggested by Hawkins Jiawei] Link: https://lkml.kernel.org/r/20220709064511.3304299-1-chenxiaosong2@huawei.com Link: https://lkml.kernel.org/r/20220707105329.4020708-1-chenxiaosong2@huawei.com Signed-off-by: ChenXiaoSong chenxiaosong2@huawei.com Signed-off-by: Hawkins Jiawei yin31149@gmail.com Cc: Anton Altaparmakov anton@tuxera.com Cc: ChenXiaoSong chenxiaosong2@huawei.com Cc: Yongqiang Liu liuyongqiang13@huawei.com Cc: Zhang Yi yi.zhang@huawei.com Cc: Zhang Xiaoxu zhangxiaoxu5@huawei.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ntfs/attrib.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
--- a/fs/ntfs/attrib.c +++ b/fs/ntfs/attrib.c @@ -606,8 +606,12 @@ static int ntfs_attr_find(const ATTR_TYP a = (ATTR_RECORD*)((u8*)ctx->attr + le32_to_cpu(ctx->attr->length)); for (;; a = (ATTR_RECORD*)((u8*)a + le32_to_cpu(a->length))) { - if ((u8*)a < (u8*)ctx->mrec || (u8*)a > (u8*)ctx->mrec + - le32_to_cpu(ctx->mrec->bytes_allocated)) + u8 *mrec_end = (u8 *)ctx->mrec + + le32_to_cpu(ctx->mrec->bytes_allocated); + u8 *name_end = (u8 *)a + le16_to_cpu(a->name_offset) + + a->name_length * sizeof(ntfschar); + if ((u8*)a < (u8*)ctx->mrec || (u8*)a > mrec_end || + name_end > mrec_end) break; ctx->attr = a; if (unlikely(le32_to_cpu(a->type) > le32_to_cpu(type) ||
From: Harald Freudenberger freude@linux.ibm.com
commit 918e75f77af7d2e049bb70469ec0a2c12782d96a upstream.
This patch slightly reworks the s390 arch_get_random_seed_{int,long} implementation: Make sure the CPACF trng instruction is never called in any interrupt context. This is done by adding an additional condition in_task().
Justification:
There are some constrains to satisfy for the invocation of the arch_get_random_seed_{int,long}() functions: - They should provide good random data during kernel initialization. - They should not be called in interrupt context as the TRNG instruction is relatively heavy weight and may for example make some network loads cause to timeout and buck.
However, it was not clear what kind of interrupt context is exactly encountered during kernel init or network traffic eventually calling arch_get_random_seed_long().
After some days of investigations it is clear that the s390 start_kernel function is not running in any interrupt context and so the trng is called:
Jul 11 18:33:39 t35lp54 kernel: [<00000001064e90ca>] arch_get_random_seed_long.part.0+0x32/0x70 Jul 11 18:33:39 t35lp54 kernel: [<000000010715f246>] random_init+0xf6/0x238 Jul 11 18:33:39 t35lp54 kernel: [<000000010712545c>] start_kernel+0x4a4/0x628 Jul 11 18:33:39 t35lp54 kernel: [<000000010590402a>] startup_continue+0x2a/0x40
The condition in_task() is true and the CPACF trng provides random data during kernel startup.
The network traffic however, is more difficult. A typical call stack looks like this:
Jul 06 17:37:07 t35lp54 kernel: [<000000008b5600fc>] extract_entropy.constprop.0+0x23c/0x240 Jul 06 17:37:07 t35lp54 kernel: [<000000008b560136>] crng_reseed+0x36/0xd8 Jul 06 17:37:07 t35lp54 kernel: [<000000008b5604b8>] crng_make_state+0x78/0x340 Jul 06 17:37:07 t35lp54 kernel: [<000000008b5607e0>] _get_random_bytes+0x60/0xf8 Jul 06 17:37:07 t35lp54 kernel: [<000000008b56108a>] get_random_u32+0xda/0x248 Jul 06 17:37:07 t35lp54 kernel: [<000000008aefe7a8>] kfence_guarded_alloc+0x48/0x4b8 Jul 06 17:37:07 t35lp54 kernel: [<000000008aeff35e>] __kfence_alloc+0x18e/0x1b8 Jul 06 17:37:07 t35lp54 kernel: [<000000008aef7f10>] __kmalloc_node_track_caller+0x368/0x4d8 Jul 06 17:37:07 t35lp54 kernel: [<000000008b611eac>] kmalloc_reserve+0x44/0xa0 Jul 06 17:37:07 t35lp54 kernel: [<000000008b611f98>] __alloc_skb+0x90/0x178 Jul 06 17:37:07 t35lp54 kernel: [<000000008b6120dc>] __napi_alloc_skb+0x5c/0x118 Jul 06 17:37:07 t35lp54 kernel: [<000000008b8f06b4>] qeth_extract_skb+0x13c/0x680 Jul 06 17:37:07 t35lp54 kernel: [<000000008b8f6526>] qeth_poll+0x256/0x3f8 Jul 06 17:37:07 t35lp54 kernel: [<000000008b63d76e>] __napi_poll.constprop.0+0x46/0x2f8 Jul 06 17:37:07 t35lp54 kernel: [<000000008b63dbec>] net_rx_action+0x1cc/0x408 Jul 06 17:37:07 t35lp54 kernel: [<000000008b937302>] __do_softirq+0x132/0x6b0 Jul 06 17:37:07 t35lp54 kernel: [<000000008abf46ce>] __irq_exit_rcu+0x13e/0x170 Jul 06 17:37:07 t35lp54 kernel: [<000000008abf531a>] irq_exit_rcu+0x22/0x50 Jul 06 17:37:07 t35lp54 kernel: [<000000008b922506>] do_io_irq+0xe6/0x198 Jul 06 17:37:07 t35lp54 kernel: [<000000008b935826>] io_int_handler+0xd6/0x110 Jul 06 17:37:07 t35lp54 kernel: [<000000008b9358a6>] psw_idle_exit+0x0/0xa Jul 06 17:37:07 t35lp54 kernel: ([<000000008ab9c59a>] arch_cpu_idle+0x52/0xe0) Jul 06 17:37:07 t35lp54 kernel: [<000000008b933cfe>] default_idle_call+0x6e/0xd0 Jul 06 17:37:07 t35lp54 kernel: [<000000008ac59f4e>] do_idle+0xf6/0x1b0 Jul 06 17:37:07 t35lp54 kernel: [<000000008ac5a28e>] cpu_startup_entry+0x36/0x40 Jul 06 17:37:07 t35lp54 kernel: [<000000008abb0d90>] smp_start_secondary+0x148/0x158 Jul 06 17:37:07 t35lp54 kernel: [<000000008b935b9e>] restart_int_handler+0x6e/0x90
which confirms that the call is in softirq context. So in_task() covers exactly the cases where we want to have CPACF trng called: not in nmi, not in hard irq, not in soft irq but in normal task context and during kernel init.
Signed-off-by: Harald Freudenberger freude@linux.ibm.com Acked-by: Jason A. Donenfeld Jason@zx2c4.com Reviewed-by: Juergen Christ jchrist@linux.ibm.com Link: https://lore.kernel.org/r/20220713131721.257907-1-freude@linux.ibm.com Fixes: e4f74400308c ("s390/archrandom: simplify back to earlier design and initialize earlier") [agordeev@linux.ibm.com changed desc, added Fixes and Link, removed -stable] Signed-off-by: Alexander Gordeev agordeev@linux.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/s390/include/asm/archrandom.h | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-)
--- a/arch/s390/include/asm/archrandom.h +++ b/arch/s390/include/asm/archrandom.h @@ -2,7 +2,7 @@ /* * Kernel interface for the s390 arch_random_* functions * - * Copyright IBM Corp. 2017, 2020 + * Copyright IBM Corp. 2017, 2022 * * Author: Harald Freudenberger freude@de.ibm.com * @@ -14,6 +14,7 @@ #ifdef CONFIG_ARCH_RANDOM
#include <linux/static_key.h> +#include <linux/preempt.h> #include <linux/atomic.h> #include <asm/cpacf.h>
@@ -32,7 +33,8 @@ static inline bool __must_check arch_get
static inline bool __must_check arch_get_random_seed_long(unsigned long *v) { - if (static_branch_likely(&s390_arch_random_available)) { + if (static_branch_likely(&s390_arch_random_available) && + in_task()) { cpacf_trng(NULL, 0, (u8 *)v, sizeof(*v)); atomic64_add(sizeof(*v), &s390_arch_random_counter); return true; @@ -42,7 +44,8 @@ static inline bool __must_check arch_get
static inline bool __must_check arch_get_random_seed_int(unsigned int *v) { - if (static_branch_likely(&s390_arch_random_available)) { + if (static_branch_likely(&s390_arch_random_available) && + in_task()) { cpacf_trng(NULL, 0, (u8 *)v, sizeof(*v)); atomic64_add(sizeof(*v), &s390_arch_random_counter); return true;
From: Kuniyuki Iwashima kuniyu@amazon.com
commit 58ebb1c8b35a8ef38cd6927431e0fa7b173a632d upstream.
While reading sysctl_tcp_dsack, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv4/tcp_input.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -4197,7 +4197,7 @@ static void tcp_dsack_set(struct sock *s { struct tcp_sock *tp = tcp_sk(sk);
- if (tcp_is_sack(tp) && sock_net(sk)->ipv4.sysctl_tcp_dsack) { + if (tcp_is_sack(tp) && READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_dsack)) { int mib_idx;
if (before(seq, tp->rcv_nxt)) @@ -4232,7 +4232,7 @@ static void tcp_send_dupack(struct sock NET_INC_STATS(sock_net(sk), LINUX_MIB_DELAYEDACKLOST); tcp_enter_quickack_mode(sk, TCP_MAX_QUICKACKS);
- if (tcp_is_sack(tp) && sock_net(sk)->ipv4.sysctl_tcp_dsack) { + if (tcp_is_sack(tp) && READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_dsack)) { u32 end_seq = TCP_SKB_CB(skb)->end_seq;
if (after(TCP_SKB_CB(skb)->end_seq, tp->rcv_nxt))
From: Kuniyuki Iwashima kuniyu@amazon.com
commit 02ca527ac5581cf56749db9fd03d854e842253dd upstream.
While reading sysctl_tcp_app_win, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv4/tcp_input.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -432,7 +432,7 @@ static void tcp_grow_window(struct sock */ void tcp_init_buffer_space(struct sock *sk) { - int tcp_app_win = sock_net(sk)->ipv4.sysctl_tcp_app_win; + int tcp_app_win = READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_app_win); struct tcp_sock *tp = tcp_sk(sk); int maxwin;
From: Kuniyuki Iwashima kuniyu@amazon.com
commit 36eeee75ef0157e42fb6593dcc65daab289b559e upstream.
While reading sysctl_tcp_adv_win_scale, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/net/tcp.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -1355,7 +1355,7 @@ void tcp_select_initial_window(const str
static inline int tcp_win_from_space(const struct sock *sk, int space) { - int tcp_adv_win_scale = sock_net(sk)->ipv4.sysctl_tcp_adv_win_scale; + int tcp_adv_win_scale = READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_adv_win_scale);
return tcp_adv_win_scale <= 0 ? (space>>(-tcp_adv_win_scale)) :
From: Kuniyuki Iwashima kuniyu@amazon.com
commit 706c6202a3589f290e1ef9be0584a8f4a3cc0507 upstream.
While reading sysctl_tcp_frto, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv4/tcp_input.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -2018,7 +2018,7 @@ void tcp_enter_loss(struct sock *sk) * loss recovery is underway except recurring timeout(s) on * the same SND.UNA (sec 3.2). Disable F-RTO on path MTU probing */ - tp->frto = net->ipv4.sysctl_tcp_frto && + tp->frto = READ_ONCE(net->ipv4.sysctl_tcp_frto) && (new_recovery || icsk->icsk_retransmits) && !inet_csk(sk)->icsk_mtup.probe_size; }
From: Kuniyuki Iwashima kuniyu@amazon.com
commit 8499a2454d9e8a55ce616ede9f9580f36fd5b0f3 upstream.
While reading sysctl_tcp_nometrics_save, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv4/tcp_metrics.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/net/ipv4/tcp_metrics.c +++ b/net/ipv4/tcp_metrics.c @@ -329,7 +329,7 @@ void tcp_update_metrics(struct sock *sk) int m;
sk_dst_confirm(sk); - if (net->ipv4.sysctl_tcp_nometrics_save || !dst) + if (READ_ONCE(net->ipv4.sysctl_tcp_nometrics_save) || !dst) return;
rcu_read_lock();
From: Liang He windhl@126.com
commit a3435afba87dc6cd83f5595e7607f3c40f93ef01 upstream.
In ufshcd_populate_vreg(), we should hold the reference returned by of_parse_phandle() and then use it to call of_node_put() for refcount balance.
Link: https://lore.kernel.org/r/20220719071529.1081166-1-windhl@126.com Fixes: aa4976130934 ("ufs: Add regulator enable support") Reviewed-by: Bart Van Assche bvanassche@acm.org Signed-off-by: Liang He windhl@126.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/scsi/ufs/ufshcd-pltfrm.c | 15 +++++++++++++-- 1 file changed, 13 insertions(+), 2 deletions(-)
--- a/drivers/scsi/ufs/ufshcd-pltfrm.c +++ b/drivers/scsi/ufs/ufshcd-pltfrm.c @@ -124,9 +124,20 @@ out: return ret; }
+static bool phandle_exists(const struct device_node *np, + const char *phandle_name, int index) +{ + struct device_node *parse_np = of_parse_phandle(np, phandle_name, index); + + if (parse_np) + of_node_put(parse_np); + + return parse_np != NULL; +} + #define MAX_PROP_SIZE 32 static int ufshcd_populate_vreg(struct device *dev, const char *name, - struct ufs_vreg **out_vreg) + struct ufs_vreg **out_vreg) { int ret = 0; char prop_name[MAX_PROP_SIZE]; @@ -139,7 +150,7 @@ static int ufshcd_populate_vreg(struct d }
snprintf(prop_name, MAX_PROP_SIZE, "%s-supply", name); - if (!of_parse_phandle(np, prop_name, 0)) { + if (!phandle_exists(np, prop_name, 0)) { dev_info(dev, "%s: Unable to find %s regulator, assuming enabled\n", __func__, prop_name); goto out;
From: Kuniyuki Iwashima kuniyu@amazon.com
commit db3815a2fa691da145cfbe834584f31ad75df9ff upstream.
While reading sysctl_tcp_challenge_ack_limit, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader.
Fixes: 282f23c6ee34 ("tcp: implement RFC 5961 3.2") Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv4/tcp_input.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -3468,7 +3468,7 @@ static void tcp_send_challenge_ack(struc /* Then check host-wide RFC 5961 rate limit. */ now = jiffies / HZ; if (now != challenge_timestamp) { - u32 ack_limit = net->ipv4.sysctl_tcp_challenge_ack_limit; + u32 ack_limit = READ_ONCE(net->ipv4.sysctl_tcp_challenge_ack_limit); u32 half = (ack_limit + 1) >> 1;
challenge_timestamp = now;
From: Kuniyuki Iwashima kuniyu@amazon.com
commit e27326009a3d247b831eda38878c777f6f4eb3d1 upstream.
When we close ping6 sockets, some resources are left unfreed because pingv6_prot is missing sk->sk_prot->destroy(). As reported by syzbot [0], just three syscalls leak 96 bytes and easily cause OOM.
struct ipv6_sr_hdr *hdr; char data[24] = {0}; int fd;
hdr = (struct ipv6_sr_hdr *)data; hdr->hdrlen = 2; hdr->type = IPV6_SRCRT_TYPE_4;
fd = socket(AF_INET6, SOCK_DGRAM, NEXTHDR_ICMP); setsockopt(fd, IPPROTO_IPV6, IPV6_RTHDR, data, 24); close(fd);
To fix memory leaks, let's add a destroy function.
Note the socket() syscall checks if the GID is within the range of net.ipv4.ping_group_range. The default value is [1, 0] so that no GID meets the condition (1 <= GID <= 0). Thus, the local DoS does not succeed until we change the default value. However, at least Ubuntu/Fedora/RHEL loosen it.
$ cat /usr/lib/sysctl.d/50-default.conf ... -net.ipv4.ping_group_range = 0 2147483647
Also, there could be another path reported with these options, and some of them require CAP_NET_RAW.
setsockopt IPV6_ADDRFORM (inet6_sk(sk)->pktoptions) IPV6_RECVPATHMTU (inet6_sk(sk)->rxpmtu) IPV6_HOPOPTS (inet6_sk(sk)->opt) IPV6_RTHDRDSTOPTS (inet6_sk(sk)->opt) IPV6_RTHDR (inet6_sk(sk)->opt) IPV6_DSTOPTS (inet6_sk(sk)->opt) IPV6_2292PKTOPTIONS (inet6_sk(sk)->opt)
getsockopt IPV6_FLOWLABEL_MGR (inet6_sk(sk)->ipv6_fl_list)
For the record, I left a different splat with syzbot's one.
unreferenced object 0xffff888006270c60 (size 96): comm "repro2", pid 231, jiffies 4294696626 (age 13.118s) hex dump (first 32 bytes): 01 00 00 00 44 00 00 00 00 00 00 00 00 00 00 00 ....D........... 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<00000000f6bc7ea9>] sock_kmalloc (net/core/sock.c:2564 net/core/sock.c:2554) [<000000006d699550>] do_ipv6_setsockopt.constprop.0 (net/ipv6/ipv6_sockglue.c:715) [<00000000c3c3b1f5>] ipv6_setsockopt (net/ipv6/ipv6_sockglue.c:1024) [<000000007096a025>] __sys_setsockopt (net/socket.c:2254) [<000000003a8ff47b>] __x64_sys_setsockopt (net/socket.c:2265 net/socket.c:2262 net/socket.c:2262) [<000000007c409dcb>] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) [<00000000e939c4a9>] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)
[0]: https://syzkaller.appspot.com/bug?extid=a8430774139ec3ab7176
Fixes: 6d0bfe226116 ("net: ipv6: Add IPv6 support to the ping socket.") Reported-by: syzbot+a8430774139ec3ab7176@syzkaller.appspotmail.com Reported-by: Ayushman Dutta ayudutta@amazon.com Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Reviewed-by: David Ahern dsahern@kernel.org Reviewed-by: Eric Dumazet edumazet@google.com Link: https://lore.kernel.org/r/20220728012220.46918-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv6/ping.c | 6 ++++++ 1 file changed, 6 insertions(+)
--- a/net/ipv6/ping.c +++ b/net/ipv6/ping.c @@ -27,6 +27,11 @@ #include <linux/proc_fs.h> #include <net/ping.h>
+static void ping_v6_destroy(struct sock *sk) +{ + inet6_destroy_sock(sk); +} + /* Compatibility glue so we can support IPv6 when it's compiled as a module */ static int dummy_ipv6_recv_error(struct sock *sk, struct msghdr *msg, int len, int *addr_len) @@ -170,6 +175,7 @@ struct proto pingv6_prot = { .owner = THIS_MODULE, .init = ping_init_sock, .close = ping_close, + .destroy = ping_v6_destroy, .connect = ip6_datagram_connect_v6_only, .disconnect = __udp_disconnect, .setsockopt = ipv6_setsockopt,
From: Kuniyuki Iwashima kuniyu@amazon.com
[ Upstream commit 8ebcc62c738f68688ee7c6fec2efe5bc6d3d7e60 ]
While reading sysctl_igmp_qrv, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers.
This test can be packed into a helper, so such changes will be in the follow-up series after net is merged into net-next.
qrv ?: READ_ONCE(net->ipv4.sysctl_igmp_qrv);
Fixes: a9fe8e29945d ("ipv4: implement igmp_qrv sysctl to tune igmp robustness variable") Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv4/igmp.c | 24 +++++++++++++----------- 1 file changed, 13 insertions(+), 11 deletions(-)
diff --git a/net/ipv4/igmp.c b/net/ipv4/igmp.c index b831825f234f..a08acb54b6b0 100644 --- a/net/ipv4/igmp.c +++ b/net/ipv4/igmp.c @@ -831,7 +831,7 @@ static void igmp_ifc_event(struct in_device *in_dev) struct net *net = dev_net(in_dev->dev); if (IGMP_V1_SEEN(in_dev) || IGMP_V2_SEEN(in_dev)) return; - WRITE_ONCE(in_dev->mr_ifc_count, in_dev->mr_qrv ?: net->ipv4.sysctl_igmp_qrv); + WRITE_ONCE(in_dev->mr_ifc_count, in_dev->mr_qrv ?: READ_ONCE(net->ipv4.sysctl_igmp_qrv)); igmp_ifc_start_timer(in_dev, 1); }
@@ -1013,7 +1013,7 @@ static bool igmp_heard_query(struct in_device *in_dev, struct sk_buff *skb, * received value was zero, use the default or statically * configured value. */ - in_dev->mr_qrv = ih3->qrv ?: net->ipv4.sysctl_igmp_qrv; + in_dev->mr_qrv = ih3->qrv ?: READ_ONCE(net->ipv4.sysctl_igmp_qrv); in_dev->mr_qi = IGMPV3_QQIC(ih3->qqic)*HZ ?: IGMP_QUERY_INTERVAL;
/* RFC3376, 8.3. Query Response Interval: @@ -1192,7 +1192,7 @@ static void igmpv3_add_delrec(struct in_device *in_dev, struct ip_mc_list *im) pmc->interface = im->interface; in_dev_hold(in_dev); pmc->multiaddr = im->multiaddr; - pmc->crcount = in_dev->mr_qrv ?: net->ipv4.sysctl_igmp_qrv; + pmc->crcount = in_dev->mr_qrv ?: READ_ONCE(net->ipv4.sysctl_igmp_qrv); pmc->sfmode = im->sfmode; if (pmc->sfmode == MCAST_INCLUDE) { struct ip_sf_list *psf; @@ -1243,9 +1243,11 @@ static void igmpv3_del_delrec(struct in_device *in_dev, struct ip_mc_list *im) swap(im->tomb, pmc->tomb); swap(im->sources, pmc->sources); for (psf = im->sources; psf; psf = psf->sf_next) - psf->sf_crcount = in_dev->mr_qrv ?: net->ipv4.sysctl_igmp_qrv; + psf->sf_crcount = in_dev->mr_qrv ?: + READ_ONCE(net->ipv4.sysctl_igmp_qrv); } else { - im->crcount = in_dev->mr_qrv ?: net->ipv4.sysctl_igmp_qrv; + im->crcount = in_dev->mr_qrv ?: + READ_ONCE(net->ipv4.sysctl_igmp_qrv); } in_dev_put(pmc->interface); kfree_pmc(pmc); @@ -1347,7 +1349,7 @@ static void igmp_group_added(struct ip_mc_list *im) if (in_dev->dead) return;
- im->unsolicit_count = net->ipv4.sysctl_igmp_qrv; + im->unsolicit_count = READ_ONCE(net->ipv4.sysctl_igmp_qrv); if (IGMP_V1_SEEN(in_dev) || IGMP_V2_SEEN(in_dev)) { spin_lock_bh(&im->lock); igmp_start_timer(im, IGMP_INITIAL_REPORT_DELAY); @@ -1361,7 +1363,7 @@ static void igmp_group_added(struct ip_mc_list *im) * IN() to IN(A). */ if (im->sfmode == MCAST_EXCLUDE) - im->crcount = in_dev->mr_qrv ?: net->ipv4.sysctl_igmp_qrv; + im->crcount = in_dev->mr_qrv ?: READ_ONCE(net->ipv4.sysctl_igmp_qrv);
igmp_ifc_event(in_dev); #endif @@ -1769,7 +1771,7 @@ static void ip_mc_reset(struct in_device *in_dev)
in_dev->mr_qi = IGMP_QUERY_INTERVAL; in_dev->mr_qri = IGMP_QUERY_RESPONSE_INTERVAL; - in_dev->mr_qrv = net->ipv4.sysctl_igmp_qrv; + in_dev->mr_qrv = READ_ONCE(net->ipv4.sysctl_igmp_qrv); } #else static void ip_mc_reset(struct in_device *in_dev) @@ -1903,7 +1905,7 @@ static int ip_mc_del1_src(struct ip_mc_list *pmc, int sfmode, #ifdef CONFIG_IP_MULTICAST if (psf->sf_oldin && !IGMP_V1_SEEN(in_dev) && !IGMP_V2_SEEN(in_dev)) { - psf->sf_crcount = in_dev->mr_qrv ?: net->ipv4.sysctl_igmp_qrv; + psf->sf_crcount = in_dev->mr_qrv ?: READ_ONCE(net->ipv4.sysctl_igmp_qrv); psf->sf_next = pmc->tomb; pmc->tomb = psf; rv = 1; @@ -1967,7 +1969,7 @@ static int ip_mc_del_src(struct in_device *in_dev, __be32 *pmca, int sfmode, /* filter mode change */ pmc->sfmode = MCAST_INCLUDE; #ifdef CONFIG_IP_MULTICAST - pmc->crcount = in_dev->mr_qrv ?: net->ipv4.sysctl_igmp_qrv; + pmc->crcount = in_dev->mr_qrv ?: READ_ONCE(net->ipv4.sysctl_igmp_qrv); WRITE_ONCE(in_dev->mr_ifc_count, pmc->crcount); for (psf = pmc->sources; psf; psf = psf->sf_next) psf->sf_crcount = 0; @@ -2146,7 +2148,7 @@ static int ip_mc_add_src(struct in_device *in_dev, __be32 *pmca, int sfmode, #ifdef CONFIG_IP_MULTICAST /* else no filters; keep old mode for reports */
- pmc->crcount = in_dev->mr_qrv ?: net->ipv4.sysctl_igmp_qrv; + pmc->crcount = in_dev->mr_qrv ?: READ_ONCE(net->ipv4.sysctl_igmp_qrv); WRITE_ONCE(in_dev->mr_ifc_count, pmc->crcount); for (psf = pmc->sources; psf; psf = psf->sf_next) psf->sf_crcount = 0;
From: Liang He windhl@126.com
[ Upstream commit ebbbe23fdf6070e31509638df3321688358cc211 ]
In bcm5421_init(), we should call of_node_put() for the reference returned by of_get_parent() which has increased the refcount.
Fixes: 3c326fe9cb7a ("[PATCH] ppc64: Add new PHY to sungem") Signed-off-by: Liang He windhl@126.com Link: https://lore.kernel.org/r/20220720131003.1287426-1-windhl@126.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/sungem_phy.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/net/sungem_phy.c b/drivers/net/sungem_phy.c index 63a8ff816e59..e556b00dfed2 100644 --- a/drivers/net/sungem_phy.c +++ b/drivers/net/sungem_phy.c @@ -453,6 +453,7 @@ static int bcm5421_init(struct mii_phy* phy) int can_low_power = 1; if (np == NULL || of_get_property(np, "no-autolowpower", NULL)) can_low_power = 0; + of_node_put(np); if (can_low_power) { /* Enable automatic low-power */ sungem_phy_write(phy, 0x1c, 0x9002);
From: Kuniyuki Iwashima kuniyu@amazon.com
[ Upstream commit e0bb4ab9dfddd872622239f49fb2bd403b70853b ]
While reading sysctl_tcp_min_tso_segs, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader.
Fixes: 95bd09eb2750 ("tcp: TSO packets automatic sizing") Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv4/tcp_output.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 13d9e8570ce5..3090b61e4edd 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -1745,7 +1745,7 @@ static u32 tcp_tso_segs(struct sock *sk, unsigned int mss_now)
min_tso = ca_ops->min_tso_segs ? ca_ops->min_tso_segs(sk) : - sock_net(sk)->ipv4.sysctl_tcp_min_tso_segs; + READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_min_tso_segs);
tso_segs = tcp_tso_autosize(sk, mss_now, min_tso); return min_t(u32, tso_segs, sk->sk_gso_max_segs);
From: Kuniyuki Iwashima kuniyu@amazon.com
[ Upstream commit 1330ffacd05fc9ac4159d19286ce119e22450ed2 ]
While reading sysctl_tcp_min_rtt_wlen, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader.
Fixes: f672258391b4 ("tcp: track min RTT using windowed min-filter") Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv4/tcp_input.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index f2dac14caeb3..04788bd5e82c 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -2903,7 +2903,7 @@ static void tcp_fastretrans_alert(struct sock *sk, const u32 prior_snd_una,
static void tcp_update_rtt_min(struct sock *sk, u32 rtt_us, const int flag) { - u32 wlen = sock_net(sk)->ipv4.sysctl_tcp_min_rtt_wlen * HZ; + u32 wlen = READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_min_rtt_wlen) * HZ; struct tcp_sock *tp = tcp_sk(sk);
if ((flag & FLAG_ACK_MAYBE_DELAYED) && rtt_us > tcp_min_rtt(tp)) {
From: Kuniyuki Iwashima kuniyu@amazon.com
[ Upstream commit 85225e6f0a76e6745bc841c9f25169c509b573d8 ]
While reading sysctl_tcp_autocorking, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader.
Fixes: f54b311142a9 ("tcp: auto corking") Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv4/tcp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 7acc0d07f148..768a7daab559 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -706,7 +706,7 @@ static bool tcp_should_autocork(struct sock *sk, struct sk_buff *skb, int size_goal) { return skb->len < size_goal && - sock_net(sk)->ipv4.sysctl_tcp_autocorking && + READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_autocorking) && !tcp_rtx_queue_empty(sk) && refcount_read(&sk->sk_wmem_alloc) > skb->truesize; }
From: Kuniyuki Iwashima kuniyu@amazon.com
[ Upstream commit 2afdbe7b8de84c28e219073a6661080e1b3ded48 ]
While reading sysctl_tcp_invalid_ratelimit, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader.
Fixes: 032ee4236954 ("tcp: helpers to mitigate ACK loops by rate-limiting out-of-window dupacks") Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv4/tcp_input.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 04788bd5e82c..38412d93400a 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -3420,7 +3420,8 @@ static bool __tcp_oow_rate_limited(struct net *net, int mib_idx, if (*last_oow_ack_time) { s32 elapsed = (s32)(tcp_jiffies32 - *last_oow_ack_time);
- if (0 <= elapsed && elapsed < net->ipv4.sysctl_tcp_invalid_ratelimit) { + if (0 <= elapsed && + elapsed < READ_ONCE(net->ipv4.sysctl_tcp_invalid_ratelimit)) { NET_INC_STATS(net, mib_idx); return true; /* rate-limited: don't send yet! */ }
From: Xin Long lucien.xin@gmail.com
[ Upstream commit aa709da0e032cee7c202047ecd75f437bb0126ed ]
Since commit 1033990ac5b2 ("sctp: implement memory accounting on tx path"), SCTP has supported memory accounting on tx path where 'sctp_wmem' is used by sk_wmem_schedule(). So we should fix the description for this option in ip-sysctl.rst accordingly.
v1->v2: - Improve the description as Marcelo suggested.
Fixes: 1033990ac5b2 ("sctp: implement memory accounting on tx path") Signed-off-by: Xin Long lucien.xin@gmail.com Acked-by: Marcelo Ricardo Leitner marcelo.leitner@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- Documentation/networking/ip-sysctl.txt | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index e315e6052b9f..eb24b8137226 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -2170,7 +2170,14 @@ sctp_rmem - vector of 3 INTEGERs: min, default, max Default: 4K
sctp_wmem - vector of 3 INTEGERs: min, default, max - Currently this tunable has no effect. + Only the first value ("min") is used, "default" and "max" are + ignored. + + min: Minimum size of send buffer that can be used by SCTP sockets. + It is guaranteed to each SCTP socket (but not association) even + under moderate memory pressure. + + Default: 4K
addr_scope_policy - INTEGER Control IPv4 address scoping - draft-stewart-tsvwg-sctp-ipv4-00
From: Kuniyuki Iwashima kuniyu@amazon.com
[ Upstream commit 4866b2b0f7672b6d760c4b8ece6fb56f965dcc8a ]
While reading sysctl_tcp_comp_sack_delay_ns, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader.
Fixes: 6d82aa242092 ("tcp: add tcp_comp_sack_delay_ns sysctl") Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv4/tcp_input.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 38412d93400a..b71866aba426 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -5262,7 +5262,8 @@ static void __tcp_ack_snd_check(struct sock *sk, int ofo_possible) if (tp->srtt_us && tp->srtt_us < rtt) rtt = tp->srtt_us;
- delay = min_t(unsigned long, sock_net(sk)->ipv4.sysctl_tcp_comp_sack_delay_ns, + delay = min_t(unsigned long, + READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_comp_sack_delay_ns), rtt * (NSEC_PER_USEC >> 3)/20); sock_hold(sk); hrtimer_start(&tp->compressed_ack_timer, ns_to_ktime(delay),
From: Kuniyuki Iwashima kuniyu@amazon.com
[ Upstream commit 79f55473bfc8ac51bd6572929a679eeb4da22251 ]
While reading sysctl_tcp_comp_sack_nr, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader.
Fixes: 9c21d2fc41c0 ("tcp: add tcp_comp_sack_nr sysctl") Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv4/tcp_input.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index b71866aba426..e1d065ea5a15 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -5239,7 +5239,7 @@ static void __tcp_ack_snd_check(struct sock *sk, int ofo_possible) }
if (!tcp_is_sack(tp) || - tp->compressed_ack >= sock_net(sk)->ipv4.sysctl_tcp_comp_sack_nr) + tp->compressed_ack >= READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_comp_sack_nr)) goto send_now;
if (tp->compressed_ack_rcv_nxt != tp->rcv_nxt) {
From: Michal Maloszewski michal.maloszewski@intel.com
[ Upstream commit 5fcbb711024aac6d4db385623e6f2fdf019f7782 ]
Fix the inability to bring an interface up on a setup with only MSI interrupts enabled (no MSI-X). Solution is to add a default number of QPs = 1. This is enough, since without MSI-X support driver enables only a basic feature set.
Fixes: bc6d33c8d93f ("i40e: Fix the number of queues available to be mapped for use") Signed-off-by: Dawid Lukwinski dawid.lukwinski@intel.com Signed-off-by: Michal Maloszewski michal.maloszewski@intel.com Tested-by: Dave Switzer david.switzer@intel.com Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Link: https://lore.kernel.org/r/20220722175401.112572-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/i40e/i40e_main.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c index 3615c6533cf4..2f3b393e5506 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_main.c +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c @@ -1808,11 +1808,15 @@ static void i40e_vsi_setup_queue_map(struct i40e_vsi *vsi, * non-zero req_queue_pairs says that user requested a new * queue count via ethtool's set_channels, so use this * value for queues distribution across traffic classes + * We need at least one queue pair for the interface + * to be usable as we see in else statement. */ if (vsi->req_queue_pairs > 0) vsi->num_queue_pairs = vsi->req_queue_pairs; else if (pf->flags & I40E_FLAG_MSIX_ENABLED) vsi->num_queue_pairs = pf->num_lan_msix; + else + vsi->num_queue_pairs = 1; }
/* Number of queues per enabled TC */
From: Duoming Zhou duoming@zju.edu.cn
[ Upstream commit b89fc26f741d9f9efb51cba3e9b241cf1380ec5a ]
There are sleep in atomic context bugs in timer handlers of sctp such as sctp_generate_t3_rtx_event(), sctp_generate_probe_event(), sctp_generate_t1_init_event(), sctp_generate_timeout_event(), sctp_generate_t3_rtx_event() and so on.
The root cause is sctp_sched_prio_init_sid() with GFP_KERNEL parameter that may sleep could be called by different timer handlers which is in interrupt context.
One of the call paths that could trigger bug is shown below:
(interrupt context) sctp_generate_probe_event sctp_do_sm sctp_side_effects sctp_cmd_interpreter sctp_outq_teardown sctp_outq_init sctp_sched_set_sched n->init_sid(..,GFP_KERNEL) sctp_sched_prio_init_sid //may sleep
This patch changes gfp_t parameter of init_sid in sctp_sched_set_sched() from GFP_KERNEL to GFP_ATOMIC in order to prevent sleep in atomic context bugs.
Fixes: 5bbbbe32a431 ("sctp: introduce stream scheduler foundations") Signed-off-by: Duoming Zhou duoming@zju.edu.cn Acked-by: Marcelo Ricardo Leitner marcelo.leitner@gmail.com Link: https://lore.kernel.org/r/20220723015809.11553-1-duoming@zju.edu.cn Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/sctp/stream_sched.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/sctp/stream_sched.c b/net/sctp/stream_sched.c index a6c04a94b08f..3a5c0d00e96c 100644 --- a/net/sctp/stream_sched.c +++ b/net/sctp/stream_sched.c @@ -178,7 +178,7 @@ int sctp_sched_set_sched(struct sctp_association *asoc, if (!SCTP_SO(&asoc->stream, i)->ext) continue;
- ret = n->init_sid(&asoc->stream, i, GFP_KERNEL); + ret = n->init_sid(&asoc->stream, i, GFP_ATOMIC); if (ret) goto err; }
From: Florian Westphal fw@strlen.de
[ Upstream commit 99a63d36cb3ed5ca3aa6fcb64cffbeaf3b0fb164 ]
Domingo Dirutigliano and Nicola Guerrera report kernel panic when sending nf_queue verdict with 1-byte nfta_payload attribute.
The IP/IPv6 stack pulls the IP(v6) header from the packet after the input hook.
If user truncates the packet below the header size, this skb_pull() will result in a malformed skb (skb->len < 0).
Fixes: 7af4cc3fa158 ("[NETFILTER]: Add "nfnetlink_queue" netfilter queue handler over nfnetlink") Reported-by: Domingo Dirutigliano pwnzer0tt1@proton.me Signed-off-by: Florian Westphal fw@strlen.de Reviewed-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/nfnetlink_queue.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/net/netfilter/nfnetlink_queue.c b/net/netfilter/nfnetlink_queue.c index a5aff2834bd6..cd496b074a71 100644 --- a/net/netfilter/nfnetlink_queue.c +++ b/net/netfilter/nfnetlink_queue.c @@ -850,11 +850,16 @@ nfqnl_enqueue_packet(struct nf_queue_entry *entry, unsigned int queuenum) }
static int -nfqnl_mangle(void *data, int data_len, struct nf_queue_entry *e, int diff) +nfqnl_mangle(void *data, unsigned int data_len, struct nf_queue_entry *e, int diff) { struct sk_buff *nskb;
if (diff < 0) { + unsigned int min_len = skb_transport_offset(e->skb); + + if (data_len < min_len) + return -EINVAL; + if (pskb_trim(e->skb, data_len)) return -ENOMEM; } else if (diff > 0) {
From: Leo Yan leo.yan@linaro.org
[ Upstream commit 2d86612aacb7805f72873691a2644d7279ed0630 ]
When using 'perf mem' and 'perf c2c', an issue is observed that tool reports the wrong offset for global data symbols. This is a common issue on both x86 and Arm64 platforms.
Let's see an example, for a test program, below is the disassembly for its .bss section which is dumped with objdump:
...
Disassembly of section .bss:
0000000000004040 <completed.0>: ...
0000000000004080 <buf1>: ...
00000000000040c0 <buf2>: ...
0000000000004100 <thread>: ...
First we used 'perf mem record' to run the test program and then used 'perf --debug verbose=4 mem report' to observe what's the symbol info for 'buf1' and 'buf2' structures.
# ./perf mem record -e ldlat-loads,ldlat-stores -- false_sharing.exe 8 # ./perf --debug verbose=4 mem report ... dso__load_sym_internal: adjusting symbol: st_value: 0x40c0 sh_addr: 0x4040 sh_offset: 0x3028 symbol__new: buf2 0x30a8-0x30e8 ... dso__load_sym_internal: adjusting symbol: st_value: 0x4080 sh_addr: 0x4040 sh_offset: 0x3028 symbol__new: buf1 0x3068-0x30a8 ...
The perf tool relies on libelf to parse symbols, in executable and shared object files, 'st_value' holds a virtual address; 'sh_addr' is the address at which section's first byte should reside in memory, and 'sh_offset' is the byte offset from the beginning of the file to the first byte in the section. The perf tool uses below formula to convert a symbol's memory address to a file address:
file_address = st_value - sh_addr + sh_offset ^ ` Memory address
We can see the final adjusted address ranges for buf1 and buf2 are [0x30a8-0x30e8) and [0x3068-0x30a8) respectively, apparently this is incorrect, in the code, the structure for 'buf1' and 'buf2' specifies compiler attribute with 64-byte alignment.
The problem happens for 'sh_offset', libelf returns it as 0x3028 which is not 64-byte aligned, combining with disassembly, it's likely libelf doesn't respect the alignment for .bss section, therefore, it doesn't return the aligned value for 'sh_offset'.
Suggested by Fangrui Song, ELF file contains program header which contains PT_LOAD segments, the fields p_vaddr and p_offset in PT_LOAD segments contain the execution info. A better choice for converting memory address to file address is using the formula:
file_address = st_value - p_vaddr + p_offset
This patch introduces elf_read_program_header() which returns the program header based on the passed 'st_value', then it uses the formula above to calculate the symbol file address; and the debugging log is updated respectively.
After applying the change:
# ./perf --debug verbose=4 mem report ... dso__load_sym_internal: adjusting symbol: st_value: 0x40c0 p_vaddr: 0x3d28 p_offset: 0x2d28 symbol__new: buf2 0x30c0-0x3100 ... dso__load_sym_internal: adjusting symbol: st_value: 0x4080 p_vaddr: 0x3d28 p_offset: 0x2d28 symbol__new: buf1 0x3080-0x30c0 ...
Fixes: f17e04afaff84b5c ("perf report: Fix ELF symbol parsing") Reported-by: Chang Rui changruinj@gmail.com Suggested-by: Fangrui Song maskray@google.com Signed-off-by: Leo Yan leo.yan@linaro.org Acked-by: Namhyung Kim namhyung@kernel.org Cc: Alexander Shishkin alexander.shishkin@linux.intel.com Cc: Ian Rogers irogers@google.com Cc: Ingo Molnar mingo@redhat.com Cc: Jiri Olsa jolsa@kernel.org Cc: Mark Rutland mark.rutland@arm.com Cc: Peter Zijlstra peterz@infradead.org Link: https://lore.kernel.org/r/20220724060013.171050-2-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- tools/perf/util/symbol-elf.c | 45 ++++++++++++++++++++++++++++++++---- 1 file changed, 41 insertions(+), 4 deletions(-)
diff --git a/tools/perf/util/symbol-elf.c b/tools/perf/util/symbol-elf.c index 166c621e0223..bd33d6613929 100644 --- a/tools/perf/util/symbol-elf.c +++ b/tools/perf/util/symbol-elf.c @@ -201,6 +201,33 @@ Elf_Scn *elf_section_by_name(Elf *elf, GElf_Ehdr *ep, return NULL; }
+static int elf_read_program_header(Elf *elf, u64 vaddr, GElf_Phdr *phdr) +{ + size_t i, phdrnum; + u64 sz; + + if (elf_getphdrnum(elf, &phdrnum)) + return -1; + + for (i = 0; i < phdrnum; i++) { + if (gelf_getphdr(elf, i, phdr) == NULL) + return -1; + + if (phdr->p_type != PT_LOAD) + continue; + + sz = max(phdr->p_memsz, phdr->p_filesz); + if (!sz) + continue; + + if (vaddr >= phdr->p_vaddr && (vaddr < phdr->p_vaddr + sz)) + return 0; + } + + /* Not found any valid program header */ + return -1; +} + static bool want_demangle(bool is_kernel_sym) { return is_kernel_sym ? symbol_conf.demangle_kernel : symbol_conf.demangle; @@ -1063,6 +1090,7 @@ int dso__load_sym(struct dso *dso, struct map *map, struct symsrc *syms_ss, sym.st_value); used_opd = true; } + /* * When loading symbols in a data mapping, ABS symbols (which * has a value of SHN_ABS in its st_shndx) failed at @@ -1099,11 +1127,20 @@ int dso__load_sym(struct dso *dso, struct map *map, struct symsrc *syms_ss, goto out_elf_end; } else if ((used_opd && runtime_ss->adjust_symbols) || (!used_opd && syms_ss->adjust_symbols)) { + GElf_Phdr phdr; + + if (elf_read_program_header(syms_ss->elf, + (u64)sym.st_value, &phdr)) { + pr_warning("%s: failed to find program header for " + "symbol: %s st_value: %#" PRIx64 "\n", + __func__, elf_name, (u64)sym.st_value); + continue; + } pr_debug4("%s: adjusting symbol: st_value: %#" PRIx64 " " - "sh_addr: %#" PRIx64 " sh_offset: %#" PRIx64 "\n", __func__, - (u64)sym.st_value, (u64)shdr.sh_addr, - (u64)shdr.sh_offset); - sym.st_value -= shdr.sh_addr - shdr.sh_offset; + "p_vaddr: %#" PRIx64 " p_offset: %#" PRIx64 "\n", + __func__, (u64)sym.st_value, (u64)phdr.p_vaddr, + (u64)phdr.p_offset); + sym.st_value -= phdr.p_vaddr - phdr.p_offset; }
demangled = demangle_sym(dso, kmodule, elf_name);
From: Greg Kroah-Hartman gregkh@linuxfoundation.org
The gcc build warning prevents all clang-built kernels from working properly, so comment it out to fix the build.
This is a -stable kernel only patch for now, it will be resolved differently in mainline releases in the future.
Cc: "Jason A. Donenfeld" Jason@zx2c4.com Cc: "Justin M. Forbes" jforbes@fedoraproject.org Cc: Ard Biesheuvel ardb@kernel.org Acked-by: Arnd Bergmann arnd@arndb.de Cc: Nicolas Pitre nico@linaro.org Cc: Nathan Chancellor nathan@kernel.org Cc: Nick Desaulniers ndesaulniers@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm/lib/xor-neon.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/arch/arm/lib/xor-neon.c +++ b/arch/arm/lib/xor-neon.c @@ -29,8 +29,9 @@ MODULE_LICENSE("GPL"); * While older versions of GCC do not generate incorrect code, they fail to * recognize the parallel nature of these functions, and emit plain ARM code, * which is known to be slower than the optimized ARM code in asm-arm/xor.h. + * + * #warning This code requires at least version 4.6 of GCC */ -#warning This code requires at least version 4.6 of GCC #endif
#pragma GCC diagnostic ignored "-Wunused-variable"
From: Wei Mingzhi whistler@member.fsf.org
commit 829eea7c94e0bac804e65975639a2f2e5f147033 upstream.
USB device ID of some versions of XiaoDu WiFi Dongle is 2955:1003 instead of 2955:1001. Both are the same mt7601u hardware.
Signed-off-by: Wei Mingzhi whistler@member.fsf.org Acked-by: Jakub Kicinski kubakici@wp.pl Signed-off-by: Kalle Valo kvalo@codeaurora.org Link: https://lore.kernel.org/r/20210618160840.305024-1-whistler@member.fsf.org Cc: Yan Xinyu sdlyyxy@bupt.edu.cn Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/wireless/mediatek/mt7601u/usb.c | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/net/wireless/mediatek/mt7601u/usb.c +++ b/drivers/net/wireless/mediatek/mt7601u/usb.c @@ -34,6 +34,7 @@ static const struct usb_device_id mt7601 { USB_DEVICE(0x2717, 0x4106) }, { USB_DEVICE(0x2955, 0x0001) }, { USB_DEVICE(0x2955, 0x1001) }, + { USB_DEVICE(0x2955, 0x1003) }, { USB_DEVICE(0x2a5f, 0x1000) }, { USB_DEVICE(0x7392, 0x7710) }, { 0, }
From: Ming Lei ming.lei@redhat.com
commit 673235f915318ced5d7ec4b2bfd8cb909e6a4a55 upstream.
When queuing I/O request to LLD, STS_RESOURCE may be returned because:
- Host is in recovery or blocked
- Target queue throttling or target is blocked
- LLD rejection
In these scenarios BLK_STS_DEV_RESOURCE is returned to the block layer to avoid an unnecessary re-run of the queue. However, all of the requests queued to this SCSI device may complete immediately after reading 'sdev->device_busy' and BLK_STS_DEV_RESOURCE is returned to block layer. In that case the current I/O won't get a chance to get queued since it is invisible at that time for both scsi_run_queue_async() and blk-mq's RESTART.
Fix the issue by not returning BLK_STS_DEV_RESOURCE in this situation.
Link: https://lore.kernel.org/r/20201202100419.525144-1-ming.lei@redhat.com Fixes: 86ff7c2a80cd ("blk-mq: introduce BLK_STS_DEV_RESOURCE") Cc: Hannes Reinecke hare@suse.com Cc: Sumit Saxena sumit.saxena@broadcom.com Cc: Kashyap Desai kashyap.desai@broadcom.com Cc: Bart Van Assche bvanassche@acm.org Cc: Ewan Milne emilne@redhat.com Cc: Long Li longli@microsoft.com Reported-by: John Garry john.garry@huawei.com Tested-by: "chenxiang (M)" chenxiang66@hisilicon.com Signed-off-by: Ming Lei ming.lei@redhat.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Yu Kuai yukuai3@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/scsi/scsi_lib.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
--- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -2157,8 +2157,7 @@ out_put_budget: case BLK_STS_OK: break; case BLK_STS_RESOURCE: - if (atomic_read(&sdev->device_busy) || - scsi_device_blocked(sdev)) + if (scsi_device_blocked(sdev)) ret = BLK_STS_DEV_RESOURCE; break; default:
From: Werner Sembach wse@tuxedocomputers.com
commit c752089f7cf5b5800c6ace4cdd1a8351ee78a598 upstream.
The TongFang PF5PU1G, PF4NU1F, PF5NU1G, and PF5LUXG/TUXEDO BA15 Gen10, Pulse 14/15 Gen1, and Pulse 15 Gen2 have the same problem as the Clevo NL5xRU and NL5xNU/TUXEDO Aura 15 Gen1 and Gen2: They have a working native and video interface. However the default detection mechanism first registers the video interface before unregistering it again and switching to the native interface during boot. This results in a dangling SBIOS request for backlight change for some reason, causing the backlight to switch to ~2% once per boot on the first power cord connect or disconnect event. Setting the native interface explicitly circumvents this buggy behaviour by avoiding the unregistering process.
Signed-off-by: Werner Sembach wse@tuxedocomputers.com Cc: All applicable stable@vger.kernel.org Reviewed-by: Hans de Goede hdegoede@redhat.com Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/acpi/video_detect.c | 51 +++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 50 insertions(+), 1 deletion(-)
--- a/drivers/acpi/video_detect.c +++ b/drivers/acpi/video_detect.c @@ -431,7 +431,56 @@ static const struct dmi_system_id video_ DMI_MATCH(DMI_BOARD_NAME, "NL5xNU"), }, }, - + /* + * The TongFang PF5PU1G, PF4NU1F, PF5NU1G, and PF5LUXG/TUXEDO BA15 Gen10, + * Pulse 14/15 Gen1, and Pulse 15 Gen2 have the same problem as the Clevo + * NL5xRU and NL5xNU/TUXEDO Aura 15 Gen1 and Gen2. See the description + * above. + */ + { + .callback = video_detect_force_native, + .ident = "TongFang PF5PU1G", + .matches = { + DMI_MATCH(DMI_BOARD_NAME, "PF5PU1G"), + }, + }, + { + .callback = video_detect_force_native, + .ident = "TongFang PF4NU1F", + .matches = { + DMI_MATCH(DMI_BOARD_NAME, "PF4NU1F"), + }, + }, + { + .callback = video_detect_force_native, + .ident = "TongFang PF4NU1F", + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "TUXEDO"), + DMI_MATCH(DMI_BOARD_NAME, "PULSE1401"), + }, + }, + { + .callback = video_detect_force_native, + .ident = "TongFang PF5NU1G", + .matches = { + DMI_MATCH(DMI_BOARD_NAME, "PF5NU1G"), + }, + }, + { + .callback = video_detect_force_native, + .ident = "TongFang PF5NU1G", + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "TUXEDO"), + DMI_MATCH(DMI_BOARD_NAME, "PULSE1501"), + }, + }, + { + .callback = video_detect_force_native, + .ident = "TongFang PF5LUXG", + .matches = { + DMI_MATCH(DMI_BOARD_NAME, "PF5LUXG"), + }, + }, /* * Desktops which falsely report a backlight and which our heuristics * for this do not catch.
From: Werner Sembach wse@tuxedocomputers.com
commit f0341e67b3782603737f7788e71bd3530012a4f4 upstream.
Taking a recent change in the i8042 quirklist to this one: Clevo board_names are somewhat unique, and if not: The generic Board_-/Sys_Vendor string "Notebook" doesn't help much anyway. So identifying the devices just by the board_name helps keeping the list significantly shorter and might even hit more devices requiring the fix.
Signed-off-by: Werner Sembach wse@tuxedocomputers.com Fixes: c844d22fe0c0 ("ACPI: video: Force backlight native for Clevo NL5xRU and NL5xNU") Cc: All applicable stable@vger.kernel.org Reviewed-by: Hans de Goede hdegoede@redhat.com Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/acpi/video_detect.c | 34 ---------------------------------- 1 file changed, 34 deletions(-)
--- a/drivers/acpi/video_detect.c +++ b/drivers/acpi/video_detect.c @@ -371,23 +371,6 @@ static const struct dmi_system_id video_ .callback = video_detect_force_native, .ident = "Clevo NL5xRU", .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "TUXEDO"), - DMI_MATCH(DMI_BOARD_NAME, "NL5xRU"), - }, - }, - { - .callback = video_detect_force_native, - .ident = "Clevo NL5xRU", - .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "SchenkerTechnologiesGmbH"), - DMI_MATCH(DMI_BOARD_NAME, "NL5xRU"), - }, - }, - { - .callback = video_detect_force_native, - .ident = "Clevo NL5xRU", - .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "Notebook"), DMI_MATCH(DMI_BOARD_NAME, "NL5xRU"), }, }, @@ -411,23 +394,6 @@ static const struct dmi_system_id video_ .callback = video_detect_force_native, .ident = "Clevo NL5xNU", .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "TUXEDO"), - DMI_MATCH(DMI_BOARD_NAME, "NL5xNU"), - }, - }, - { - .callback = video_detect_force_native, - .ident = "Clevo NL5xNU", - .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "SchenkerTechnologiesGmbH"), - DMI_MATCH(DMI_BOARD_NAME, "NL5xNU"), - }, - }, - { - .callback = video_detect_force_native, - .ident = "Clevo NL5xNU", - .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "Notebook"), DMI_MATCH(DMI_BOARD_NAME, "NL5xNU"), }, },
From: Ning Qiang sohu0106@126.com
commit fd97e4ad6d3b0c9fce3bca8ea8e6969d9ce7423b upstream.
In do_adb_query() function of drivers/macintosh/adb.c, req->data is copied form userland. The parameter "req->data[2]" is missing check, the array size of adb_handler[] is 16, so adb_handler[req->data[2]].original_address and adb_handler[req->data[2]].handler_id will lead to oob read.
Cc: stable stable@kernel.org Signed-off-by: Ning Qiang sohu0106@126.com Reviewed-by: Kees Cook keescook@chromium.org Reviewed-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Acked-by: Benjamin Herrenschmidt benh@kernel.crashing.org Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://lore.kernel.org/r/20220713153734.2248-1-sohu0106@126.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/macintosh/adb.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/macintosh/adb.c +++ b/drivers/macintosh/adb.c @@ -645,7 +645,7 @@ do_adb_query(struct adb_request *req)
switch(req->data[1]) { case ADB_QUERY_GETDEVINFO: - if (req->nbytes < 3) + if (req->nbytes < 3 || req->data[2] >= 16) break; mutex_lock(&adb_handler_mutex); req->reply[0] = adb_handler[req->data[2]].original_address;
From: Daniel Sneddon daniel.sneddon@linux.intel.com
commit 2b1299322016731d56807aa49254a5ea3080b6b3 upstream.
tl;dr: The Enhanced IBRS mitigation for Spectre v2 does not work as documented for RET instructions after VM exits. Mitigate it with a new one-entry RSB stuffing mechanism and a new LFENCE.
== Background ==
Indirect Branch Restricted Speculation (IBRS) was designed to help mitigate Branch Target Injection and Speculative Store Bypass, i.e. Spectre, attacks. IBRS prevents software run in less privileged modes from affecting branch prediction in more privileged modes. IBRS requires the MSR to be written on every privilege level change.
To overcome some of the performance issues of IBRS, Enhanced IBRS was introduced. eIBRS is an "always on" IBRS, in other words, just turn it on once instead of writing the MSR on every privilege level change. When eIBRS is enabled, more privileged modes should be protected from less privileged modes, including protecting VMMs from guests.
== Problem ==
Here's a simplification of how guests are run on Linux' KVM:
void run_kvm_guest(void) { // Prepare to run guest VMRESUME(); // Clean up after guest runs }
The execution flow for that would look something like this to the processor:
1. Host-side: call run_kvm_guest() 2. Host-side: VMRESUME 3. Guest runs, does "CALL guest_function" 4. VM exit, host runs again 5. Host might make some "cleanup" function calls 6. Host-side: RET from run_kvm_guest()
Now, when back on the host, there are a couple of possible scenarios of post-guest activity the host needs to do before executing host code:
* on pre-eIBRS hardware (legacy IBRS, or nothing at all), the RSB is not touched and Linux has to do a 32-entry stuffing.
* on eIBRS hardware, VM exit with IBRS enabled, or restoring the host IBRS=1 shortly after VM exit, has a documented side effect of flushing the RSB except in this PBRSB situation where the software needs to stuff the last RSB entry "by hand".
IOW, with eIBRS supported, host RET instructions should no longer be influenced by guest behavior after the host retires a single CALL instruction.
However, if the RET instructions are "unbalanced" with CALLs after a VM exit as is the RET in #6, it might speculatively use the address for the instruction after the CALL in #3 as an RSB prediction. This is a problem since the (untrusted) guest controls this address.
Balanced CALL/RET instruction pairs such as in step #5 are not affected.
== Solution ==
The PBRSB issue affects a wide variety of Intel processors which support eIBRS. But not all of them need mitigation. Today, X86_FEATURE_RETPOLINE triggers an RSB filling sequence that mitigates PBRSB. Systems setting RETPOLINE need no further mitigation - i.e., eIBRS systems which enable retpoline explicitly.
However, such systems (X86_FEATURE_IBRS_ENHANCED) do not set RETPOLINE and most of them need a new mitigation.
Therefore, introduce a new feature flag X86_FEATURE_RSB_VMEXIT_LITE which triggers a lighter-weight PBRSB mitigation versus RSB Filling at vmexit.
The lighter-weight mitigation performs a CALL instruction which is immediately followed by a speculative execution barrier (INT3). This steers speculative execution to the barrier -- just like a retpoline -- which ensures that speculation can never reach an unbalanced RET. Then, ensure this CALL is retired before continuing execution with an LFENCE.
In other words, the window of exposure is opened at VM exit where RET behavior is troublesome. While the window is open, force RSB predictions sampling for RET targets to a dead end at the INT3. Close the window with the LFENCE.
There is a subset of eIBRS systems which are not vulnerable to PBRSB. Add these systems to the cpu_vuln_whitelist[] as NO_EIBRS_PBRSB. Future systems that aren't vulnerable will set ARCH_CAP_PBRSB_NO.
[ bp: Massage, incorporate review comments from Andy Cooper. ] [ Pawan: Update commit message to replace RSB_VMEXIT with RETPOLINE ]
Signed-off-by: Daniel Sneddon daniel.sneddon@linux.intel.com Co-developed-by: Pawan Gupta pawan.kumar.gupta@linux.intel.com Signed-off-by: Pawan Gupta pawan.kumar.gupta@linux.intel.com Signed-off-by: Borislav Petkov bp@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- Documentation/admin-guide/hw-vuln/spectre.rst | 8 +++ arch/x86/include/asm/cpufeatures.h | 2 arch/x86/include/asm/msr-index.h | 4 + arch/x86/include/asm/nospec-branch.h | 15 ++++++ arch/x86/kernel/cpu/bugs.c | 61 +++++++++++++++++++++++++- arch/x86/kernel/cpu/common.c | 12 ++++- arch/x86/kvm/vmx.c | 6 +- 7 files changed, 102 insertions(+), 6 deletions(-)
--- a/Documentation/admin-guide/hw-vuln/spectre.rst +++ b/Documentation/admin-guide/hw-vuln/spectre.rst @@ -422,6 +422,14 @@ The possible values in this file are: 'RSB filling' Protection of RSB on context switch enabled ============= ===========================================
+ - EIBRS Post-barrier Return Stack Buffer (PBRSB) protection status: + + =========================== ======================================================= + 'PBRSB-eIBRS: SW sequence' CPU is affected and protection of RSB on VMEXIT enabled + 'PBRSB-eIBRS: Vulnerable' CPU is vulnerable + 'PBRSB-eIBRS: Not affected' CPU is not affected by PBRSB + =========================== ======================================================= + Full mitigation might require a microcode update from the CPU vendor. When the necessary microcode is not available, the kernel will report vulnerability. --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -283,6 +283,7 @@ #define X86_FEATURE_CQM_MBM_LOCAL (11*32+ 3) /* LLC Local MBM monitoring */ #define X86_FEATURE_FENCE_SWAPGS_USER (11*32+ 4) /* "" LFENCE in user entry SWAPGS path */ #define X86_FEATURE_FENCE_SWAPGS_KERNEL (11*32+ 5) /* "" LFENCE in kernel entry SWAPGS path */ +#define X86_FEATURE_RSB_VMEXIT_LITE (11*32+ 6) /* "" Fill RSB on VM exit when EIBRS is enabled */
/* AMD-defined CPU features, CPUID level 0x80000008 (EBX), word 13 */ #define X86_FEATURE_CLZERO (13*32+ 0) /* CLZERO instruction */ @@ -395,5 +396,6 @@ #define X86_BUG_ITLB_MULTIHIT X86_BUG(23) /* CPU may incur MCE during certain page attribute changes */ #define X86_BUG_SRBDS X86_BUG(24) /* CPU may leak RNG bits if not mitigated */ #define X86_BUG_MMIO_STALE_DATA X86_BUG(25) /* CPU is affected by Processor MMIO Stale Data vulnerabilities */ +#define X86_BUG_EIBRS_PBRSB X86_BUG(26) /* EIBRS is vulnerable to Post Barrier RSB Predictions */
#endif /* _ASM_X86_CPUFEATURES_H */ --- a/arch/x86/include/asm/msr-index.h +++ b/arch/x86/include/asm/msr-index.h @@ -120,6 +120,10 @@ * bit available to control VERW * behavior. */ +#define ARCH_CAP_PBRSB_NO BIT(24) /* + * Not susceptible to Post-Barrier + * Return Stack Buffer Predictions. + */
#define MSR_IA32_FLUSH_CMD 0x0000010b #define L1D_FLUSH BIT(0) /* --- a/arch/x86/include/asm/nospec-branch.h +++ b/arch/x86/include/asm/nospec-branch.h @@ -54,6 +54,14 @@ jnz 771b; \ add $(BITS_PER_LONG/8) * nr, sp;
+/* Sequence to mitigate PBRSB on eIBRS CPUs */ +#define __ISSUE_UNBALANCED_RET_GUARD(sp) \ + call 881f; \ + int3; \ +881: \ + add $(BITS_PER_LONG/8), sp; \ + lfence; + #ifdef __ASSEMBLY__
/* @@ -269,6 +277,13 @@ static inline void vmexit_fill_RSB(void) : "=r" (loops), ASM_CALL_CONSTRAINT : : "memory" ); #endif + asm volatile (ANNOTATE_NOSPEC_ALTERNATIVE + ALTERNATIVE("jmp 920f", + __stringify(__ISSUE_UNBALANCED_RET_GUARD(%0)), + X86_FEATURE_RSB_VMEXIT_LITE) + "920:" + : ASM_CALL_CONSTRAINT + : : "memory" ); }
static __always_inline --- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -1043,6 +1043,49 @@ static enum spectre_v2_mitigation __init return SPECTRE_V2_RETPOLINE; }
+static void __init spectre_v2_determine_rsb_fill_type_at_vmexit(enum spectre_v2_mitigation mode) +{ + /* + * Similar to context switches, there are two types of RSB attacks + * after VM exit: + * + * 1) RSB underflow + * + * 2) Poisoned RSB entry + * + * When retpoline is enabled, both are mitigated by filling/clearing + * the RSB. + * + * When IBRS is enabled, while #1 would be mitigated by the IBRS branch + * prediction isolation protections, RSB still needs to be cleared + * because of #2. Note that SMEP provides no protection here, unlike + * user-space-poisoned RSB entries. + * + * eIBRS should protect against RSB poisoning, but if the EIBRS_PBRSB + * bug is present then a LITE version of RSB protection is required, + * just a single call needs to retire before a RET is executed. + */ + switch (mode) { + case SPECTRE_V2_NONE: + /* These modes already fill RSB at vmexit */ + case SPECTRE_V2_LFENCE: + case SPECTRE_V2_RETPOLINE: + case SPECTRE_V2_EIBRS_RETPOLINE: + return; + + case SPECTRE_V2_EIBRS_LFENCE: + case SPECTRE_V2_EIBRS: + if (boot_cpu_has_bug(X86_BUG_EIBRS_PBRSB)) { + setup_force_cpu_cap(X86_FEATURE_RSB_VMEXIT_LITE); + pr_info("Spectre v2 / PBRSB-eIBRS: Retire a single CALL on VMEXIT\n"); + } + return; + } + + pr_warn_once("Unknown Spectre v2 mode, disabling RSB mitigation at VM exit"); + dump_stack(); +} + static void __init spectre_v2_select_mitigation(void) { enum spectre_v2_mitigation_cmd cmd = spectre_v2_parse_cmdline(); @@ -1135,6 +1178,8 @@ static void __init spectre_v2_select_mit setup_force_cpu_cap(X86_FEATURE_RSB_CTXSW); pr_info("Spectre v2 / SpectreRSB mitigation: Filling RSB on context switch\n");
+ spectre_v2_determine_rsb_fill_type_at_vmexit(mode); + /* * Retpoline means the kernel is safe because it has no indirect * branches. Enhanced IBRS protects firmware too, so, enable restricted @@ -1867,6 +1912,19 @@ static char *ibpb_state(void) return ""; }
+static char *pbrsb_eibrs_state(void) +{ + if (boot_cpu_has_bug(X86_BUG_EIBRS_PBRSB)) { + if (boot_cpu_has(X86_FEATURE_RSB_VMEXIT_LITE) || + boot_cpu_has(X86_FEATURE_RETPOLINE)) + return ", PBRSB-eIBRS: SW sequence"; + else + return ", PBRSB-eIBRS: Vulnerable"; + } else { + return ", PBRSB-eIBRS: Not affected"; + } +} + static ssize_t spectre_v2_show_state(char *buf) { if (spectre_v2_enabled == SPECTRE_V2_LFENCE) @@ -1879,12 +1937,13 @@ static ssize_t spectre_v2_show_state(cha spectre_v2_enabled == SPECTRE_V2_EIBRS_LFENCE) return sprintf(buf, "Vulnerable: eIBRS+LFENCE with unprivileged eBPF and SMT\n");
- return sprintf(buf, "%s%s%s%s%s%s\n", + return sprintf(buf, "%s%s%s%s%s%s%s\n", spectre_v2_strings[spectre_v2_enabled], ibpb_state(), boot_cpu_has(X86_FEATURE_USE_IBRS_FW) ? ", IBRS_FW" : "", stibp_state(), boot_cpu_has(X86_FEATURE_RSB_CTXSW) ? ", RSB filling" : "", + pbrsb_eibrs_state(), spectre_v2_module_string()); }
--- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -954,6 +954,7 @@ static void identify_cpu_without_cpuid(s #define MSBDS_ONLY BIT(5) #define NO_SWAPGS BIT(6) #define NO_ITLB_MULTIHIT BIT(7) +#define NO_EIBRS_PBRSB BIT(8)
#define VULNWL(_vendor, _family, _model, _whitelist) \ { X86_VENDOR_##_vendor, _family, _model, X86_FEATURE_ANY, _whitelist } @@ -990,7 +991,7 @@ static const __initconst struct x86_cpu_
VULNWL_INTEL(ATOM_GOLDMONT, NO_MDS | NO_L1TF | NO_SWAPGS | NO_ITLB_MULTIHIT), VULNWL_INTEL(ATOM_GOLDMONT_X, NO_MDS | NO_L1TF | NO_SWAPGS | NO_ITLB_MULTIHIT), - VULNWL_INTEL(ATOM_GOLDMONT_PLUS, NO_MDS | NO_L1TF | NO_SWAPGS | NO_ITLB_MULTIHIT), + VULNWL_INTEL(ATOM_GOLDMONT_PLUS, NO_MDS | NO_L1TF | NO_SWAPGS | NO_ITLB_MULTIHIT | NO_EIBRS_PBRSB),
/* * Technically, swapgs isn't serializing on AMD (despite it previously @@ -1000,7 +1001,9 @@ static const __initconst struct x86_cpu_ * good enough for our purposes. */
- VULNWL_INTEL(ATOM_TREMONT_X, NO_ITLB_MULTIHIT), + VULNWL_INTEL(ATOM_TREMONT, NO_EIBRS_PBRSB), + VULNWL_INTEL(ATOM_TREMONT_L, NO_EIBRS_PBRSB), + VULNWL_INTEL(ATOM_TREMONT_X, NO_ITLB_MULTIHIT | NO_EIBRS_PBRSB),
/* AMD Family 0xf - 0x12 */ VULNWL_AMD(0x0f, NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT), @@ -1154,6 +1157,11 @@ static void __init cpu_set_bug_bits(stru !arch_cap_mmio_immune(ia32_cap)) setup_force_cpu_bug(X86_BUG_MMIO_STALE_DATA);
+ if (cpu_has(c, X86_FEATURE_IBRS_ENHANCED) && + !cpu_matches(cpu_vuln_whitelist, NO_EIBRS_PBRSB) && + !(ia32_cap & ARCH_CAP_PBRSB_NO)) + setup_force_cpu_bug(X86_BUG_EIBRS_PBRSB); + if (cpu_matches(cpu_vuln_whitelist, NO_MELTDOWN)) return;
--- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -10988,6 +10988,9 @@ static void __noclone vmx_vcpu_run(struc #endif );
+ /* Eliminate branch target predictions from guest mode */ + vmexit_fill_RSB(); + vmx_enable_fb_clear(vmx);
/* @@ -11010,9 +11013,6 @@ static void __noclone vmx_vcpu_run(struc
x86_spec_ctrl_restore_host(vmx->spec_ctrl, 0);
- /* Eliminate branch target predictions from guest mode */ - vmexit_fill_RSB(); - /* All fields are clean at this point */ if (static_branch_unlikely(&enable_evmcs)) current_evmcs->hv_clean_fields |=
From: Pawan Gupta pawan.kumar.gupta@linux.intel.com
commit ba6e31af2be96c4d0536f2152ed6f7b6c11bca47 upstream.
RSB fill sequence does not have any protection for miss-prediction of conditional branch at the end of the sequence. CPU can speculatively execute code immediately after the sequence, while RSB filling hasn't completed yet.
#define __FILL_RETURN_BUFFER(reg, nr, sp) \ mov $(nr/2), reg; \ 771: \ call 772f; \ 773: /* speculation trap */ \ pause; \ lfence; \ jmp 773b; \ 772: \ call 774f; \ 775: /* speculation trap */ \ pause; \ lfence; \ jmp 775b; \ 774: \ dec reg; \ jnz 771b; <----- CPU can miss-predict here. \ add $(BITS_PER_LONG/8) * nr, sp;
Before RSB is filled, RETs that come in program order after this macro can be executed speculatively, making them vulnerable to RSB-based attacks.
Mitigate it by adding an LFENCE after the conditional branch to prevent speculation while RSB is being filled.
Suggested-by: Andrew Cooper andrew.cooper3@citrix.com Signed-off-by: Pawan Gupta pawan.kumar.gupta@linux.intel.com Signed-off-by: Borislav Petkov bp@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/include/asm/nospec-branch.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/arch/x86/include/asm/nospec-branch.h +++ b/arch/x86/include/asm/nospec-branch.h @@ -52,7 +52,9 @@ 774: \ dec reg; \ jnz 771b; \ - add $(BITS_PER_LONG/8) * nr, sp; + add $(BITS_PER_LONG/8) * nr, sp; \ + /* barrier for jnz misprediction */ \ + lfence;
/* Sequence to mitigate PBRSB on eIBRS CPUs */ #define __ISSUE_UNBALANCED_RET_GUARD(sp) \
Hi!
This is the start of the stable review cycle for the 4.19.255 release. There are 32 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
CIP testing did not find any problems here:
https://gitlab.com/cip-project/cip-testing/linux-stable-rc-ci/-/tree/linux-4...
Tested-by: Pavel Machek (CIP) pavel@denx.de
Best regards, Pavel
On Tue, 9 Aug 2022 at 23:32, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 4.19.255 release. There are 32 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 11 Aug 2022 17:55:02 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.19.255-rc... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.19.y and the diffstat can be found below.
thanks,
greg k-h
Results from Linaro's test farm. No regressions on arm64, arm, x86_64, and i386.
Tested-by: Linux Kernel Functional Testing lkft@linaro.org
NOTE: Following warnings were noticed on arm64 and arm
WARNING: modpost: Found 1 section mismatch(es). To see full details build your kernel with: 'make CONFIG_DEBUG_SECTION_MISMATCH=y' aarch64-linux-gnu-ld: warning: -z norelro ignored aarch64-linux-gnu-ld: warning: .tmp_vmlinux1 has a LOAD segment with RWX permissions aarch64-linux-gnu-ld: warning: -z norelro ignored aarch64-linux-gnu-ld: warning: .tmp_vmlinux2 has a LOAD segment with RWX permissions aarch64-linux-gnu-ld: warning: -z norelro ignored aarch64-linux-gnu-ld: warning: vmlinux has a LOAD segment with RWX permissions
This was reported on earlier stable rc reviews ref: https://lore.kernel.org/lkml/CA+G9fYuxx3wdLXiKhYAPEs-g6uxPn-OsyaiHQOvjuegVES...
## Build * kernel: 4.19.255-rc1 * git: https://gitlab.com/Linaro/lkft/mirrors/stable/linux-stable-rc * git branch: linux-4.19.y * git commit: 02c6011ece11c67e9ec89b3d3e0c25cff42b3ea0 * git describe: v4.19.254-33-g02c6011ece11 * test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-4.19.y/build/v4.19....
## No test Regressions (compared to v4.19.253-63-gf68ffa0f9e2a)
## No metric Regressions (compared to v4.19.253-63-gf68ffa0f9e2a)
## No test Fixes (compared to v4.19.253-63-gf68ffa0f9e2a)
## No Metric Fixes (compared to v4.19.253-63-gf68ffa0f9e2a)
## Test result summary total: 66365, pass: 57931, fail: 287, skip: 7425, xfail: 722
## Build Summary * arc: 10 total, 10 passed, 0 failed * arm: 291 total, 286 passed, 5 failed * arm64: 58 total, 57 passed, 1 failed * i386: 26 total, 25 passed, 1 failed * mips: 35 total, 35 passed, 0 failed * parisc: 12 total, 12 passed, 0 failed * powerpc: 54 total, 54 passed, 0 failed * s390: 12 total, 12 passed, 0 failed * sh: 24 total, 24 passed, 0 failed * sparc: 12 total, 12 passed, 0 failed * x86_64: 52 total, 51 passed, 1 failed
## Test suites summary * fwts * igt-gpu-tools * kunit * kvm-unit-tests * libhugetlbfs * log-parser-boot * log-parser-test * ltp-cap_bounds * ltp-commands * ltp-containers * ltp-controllers * ltp-cpuhotplug * ltp-crypto * ltp-cve * ltp-dio * ltp-fcntl-locktests * ltp-filecaps * ltp-fs * ltp-fs_bind * ltp-fs_perms_simple * ltp-fsx * ltp-hugetlb * ltp-io * ltp-ipc * ltp-math * ltp-mm * ltp-nptl * ltp-open-posix-tests * ltp-pty * ltp-sched * ltp-securebits * ltp-smoke * ltp-syscalls * ltp-tracing * network-basic-tests * packetdrill * rcutorture * ssuite * v4l2-compliance
-- Linaro LKFT https://lkft.linaro.org
Hi Greg,
On Tue, Aug 09, 2022 at 07:59:51PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.19.255 release. There are 32 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 11 Aug 2022 17:55:02 +0000. Anything received after that time might be too late.
Build test (gcc version 11.3.1 20220807): mips: 63 configs -> no failure arm: 115 configs -> no failure arm64: 2 configs -> no failure x86_64: 4 configs -> no failure alpha allmodconfig -> no failure powerpc allmodconfig -> no failure riscv allmodconfig -> no failure s390 allmodconfig -> no failure xtensa allmodconfig -> no failure
Boot test: x86_64: Booted on my test laptop. No regression. x86_64: Booted on qemu. No regression. [1]
[1]. https://openqa.qa.codethink.co.uk/tests/1616
Tested-by: Sudip Mukherjee sudip.mukherjee@codethink.co.uk
-- Regards Sudip
On Tue, Aug 09, 2022 at 07:59:51PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.19.255 release. There are 32 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 11 Aug 2022 17:55:02 +0000. Anything received after that time might be too late.
Build results: total: 157 pass: 157 fail: 0 Qemu test results: total: 425 pass: 425 fail: 0
Tested-by: Guenter Roeck linux@roeck-us.net
Guenter
On Tue, 09 Aug 2022 19:59:51 +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.19.255 release. There are 32 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 11 Aug 2022 17:55:02 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.19.255-rc... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.19.y and the diffstat can be found below.
thanks,
greg k-h
All tests passing for Tegra ...
Test results for stable-v4.19: 10 builds: 10 pass, 0 fail 22 boots: 22 pass, 0 fail 40 tests: 40 pass, 0 fail
Linux version: 4.19.255-rc1-g02c6011ece11 Boards tested: tegra124-jetson-tk1, tegra186-p2771-0000, tegra194-p2972-0000, tegra20-ventana, tegra210-p2371-2180, tegra30-cardhu-a04
Tested-by: Jon Hunter jonathanh@nvidia.com
Jon
On 8/9/22 11:59 AM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.19.255 release. There are 32 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 11 Aug 2022 17:55:02 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.19.255-rc... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.19.y and the diffstat can be found below.
thanks,
greg k-h
Compiled and booted on my test system. No dmesg regressions.
Tested-by: Shuah Khan skhan@linuxfoundation.org
thanks, -- Shuah
linux-stable-mirror@lists.linaro.org