This is the start of the stable review cycle for the 5.15.45 release. There are 66 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sun, 05 Jun 2022 17:38:05 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.15.45-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.15.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 5.15.45-rc1
Kumar Kartikeya Dwivedi memxor@gmail.com bpf: Check PTR_TO_MEM | MEM_RDONLY in check_helper_mem_access
Kumar Kartikeya Dwivedi memxor@gmail.com bpf: Reject writes for PTR_TO_MAP_KEY in check_helper_mem_access
Yuntao Wang ytcoode@gmail.com bpf: Fix excessive memory allocation in stack_map_alloc()
Liu Jian liujian56@huawei.com bpf: Enlarge offset check value to INT_MAX in bpf_skb_{load,store}_bytes
Yuntao Wang ytcoode@gmail.com bpf: Fix potential array overflow in bpf_trampoline_get_progs()
Chuck Lever chuck.lever@oracle.com NFSD: Fix possible sleep during nfsd4_release_lockowner()
Trond Myklebust trond.myklebust@hammerspace.com NFS: Memory allocation failures are not server fatal errors
Akira Yokosawa akiyks@gmail.com docs: submitting-patches: Fix crossref to 'The canonical patch format'
Xiu Jianfeng xiujianfeng@huawei.com tpm: ibmvtpm: Correct the return value in tpm_ibmvtpm_probe()
Stefan Mahnke-Hartmann stefan.mahnke-hartmann@infineon.com tpm: Fix buffer access in tpm2_get_tpm_pt()
Bryan O'Donoghue bryan.odonoghue@linaro.org media: i2c: imx412: Fix power_off ordering
Bryan O'Donoghue bryan.odonoghue@linaro.org media: i2c: imx412: Fix reset GPIO polarity
Reinette Chatre reinette.chatre@intel.com x86/sgx: Ensure no data in PCMD page after truncate
Reinette Chatre reinette.chatre@intel.com x86/sgx: Fix race between reclaimer and page fault handler
Reinette Chatre reinette.chatre@intel.com x86/sgx: Obtain backing storage page with enclave mutex held
Reinette Chatre reinette.chatre@intel.com x86/sgx: Mark PCMD page as dirty when modifying contents
Reinette Chatre reinette.chatre@intel.com x86/sgx: Disconnect backing page references from dirty status
Tao Jin tao-j@outlook.com HID: multitouch: add quirks to enable Lenovo X12 trackpoint
Marek Maślanka mm@semihalf.com HID: multitouch: Add support for Google Whiskers Touchpad
Randy Dunlap rdunlap@infradead.org fs/ntfs3: validate BOOT sectors_per_clusters
Mariusz Tkaczyk mariusz.tkaczyk@linux.intel.com raid5: introduce MD_BROKEN
Sarthak Kukreti sarthakkukreti@google.com dm verity: set DM_TARGET_IMMUTABLE feature flag
Mikulas Patocka mpatocka@redhat.com dm stats: add cond_resched when looping over entries
Mikulas Patocka mpatocka@redhat.com dm crypt: make printing of the key constant-time
Dan Carpenter dan.carpenter@oracle.com dm integrity: fix error code in dm_integrity_ctr()
Jonathan Bakker xc-racer2@live.ca ARM: dts: s5pv210: Correct interrupt name for bluetooth in Aries
Steven Rostedt rostedt@goodmis.org Bluetooth: hci_qca: Use del_timer_sync() before freeing
Craig McLure craig@mclure.net ALSA: usb-audio: Configure sync endpoints before data
Takashi Iwai tiwai@suse.de ALSA: usb-audio: Add missing ep_idx in fixed EP quirks
Takashi Iwai tiwai@suse.de ALSA: usb-audio: Workaround for clock setup on TEAC devices
Sultan Alsawaf sultan@kerneltoast.com zsmalloc: fix races between asynchronous zspage free and page migration
Vitaly Chikunov vt@altlinux.org crypto: ecrdsa - Fix incorrect use of vli_cmp
Fabio Estevam festevam@denx.de crypto: caam - fix i.MX6SX entropy delay value
Ashish Kalra ashish.kalra@amd.com KVM: SVM: Use kzalloc for sev ioctl interfaces to prevent kernel data leak
Sean Christopherson seanjc@google.com KVM: x86: Drop WARNs that assert a triple fault never "escapes" from L2
Sean Christopherson seanjc@google.com KVM: x86: avoid calling x86 emulator without a decoded instruction
Paolo Bonzini pbonzini@redhat.com x86, kvm: use correct GFP flags for preemption disabled
Sean Christopherson seanjc@google.com x86/kvm: Alloc dummy async #PF token outside of raw spinlock
Xiaomeng Tong xiam0nd.tong@gmail.com KVM: PPC: Book3S HV: fix incorrect NULL check on list iterator
Florian Westphal fw@strlen.de netfilter: conntrack: re-fetch conntrack after insertion
Pablo Neira Ayuso pablo@netfilter.org netfilter: nf_tables: double hook unregistration in netns path
Pablo Neira Ayuso pablo@netfilter.org netfilter: nf_tables: hold mutex on netns pre_exit path
Pablo Neira Ayuso pablo@netfilter.org netfilter: nf_tables: sanitize nft_set_desc_concat_parse()
Nicolai Stange nstange@suse.de crypto: drbg - make reseeding from get_random_bytes() synchronous
Nicolai Stange nstange@suse.de crypto: drbg - move dynamic ->reseed_threshold adjustments to __drbg_seed()
Nicolai Stange nstange@suse.de crypto: drbg - track whether DRBG was seeded with !rng_is_initialized()
Nicolai Stange nstange@suse.de crypto: drbg - prepare for more fine-grained tracking of seeding state
Justin M. Forbes jforbes@fedoraproject.org lib/crypto: add prompts back to crypto libraries
Yuezhang Mo Yuezhang.Mo@sony.com exfat: fix referencing wrong parent directory information after renaming
Tadeusz Struk tadeusz.struk@linaro.org exfat: check if cluster num is valid
Gustavo A. R. Silva gustavoars@kernel.org drm/i915: Fix -Wstringop-overflow warning in call to intel_read_wm_latency()
Alex Elder elder@linaro.org net: ipa: compute proper aggregation limit
David Howells dhowells@redhat.com pipe: Fix missing lock in pipe_resize_ring()
Kuniyuki Iwashima kuniyu@amazon.co.jp pipe: make poll_usage boolean and annotate its access
Stephen Brennan stephen.s.brennan@oracle.com assoc_array: Fix BUG_ON during garbage collect
Dan Carpenter dan.carpenter@oracle.com i2c: ismt: prevent memory corruption in ismt_access()
Pablo Neira Ayuso pablo@netfilter.org netfilter: nf_tables: disallow non-stateful expression in sets earlier
Piyush Malgujar pmalgujar@marvell.com drivers: i2c: thunderx: Allow driver to work with ACPI defined TWSI controllers
Mika Westerberg mika.westerberg@linux.intel.com i2c: ismt: Provide a DMA buffer for Interrupt Cause Logging
Joel Stanley joel@jms.id.au net: ftgmac100: Disable hardware checksum on AST2600
Lin Ma linma@zju.edu.cn nfc: pn533: Fix buggy cleanup order
Thomas Bartschies thomas.bartschies@cvk.de net: af_key: check encryption module availability consistency
Al Viro viro@zeniv.linux.org.uk percpu_ref_init(): clean ->percpu_count_ref on failure
Quentin Perret qperret@google.com KVM: arm64: Don't hypercall before EL2 init
IotaHydrae writeforever@foxmail.com pinctrl: sunxi: fix f1c100s uart2 function
Forest Crossman cyrozap@gmail.com ALSA: usb-audio: Don't get sample rate for MCT Trigger 5 USB-to-HDMI
-------------
Diffstat:
Documentation/process/submitting-patches.rst | 2 +- Makefile | 4 +- arch/arm/boot/dts/s5pv210-aries.dtsi | 2 +- arch/arm64/kvm/arm.c | 3 +- arch/powerpc/kvm/book3s_hv_uvmem.c | 8 +- arch/x86/kernel/cpu/sgx/encl.c | 113 ++++++++++++++++++++++++-- arch/x86/kernel/cpu/sgx/encl.h | 2 +- arch/x86/kernel/cpu/sgx/main.c | 13 ++- arch/x86/kernel/kvm.c | 41 ++++++---- arch/x86/kvm/svm/nested.c | 3 - arch/x86/kvm/svm/sev.c | 12 +-- arch/x86/kvm/vmx/nested.c | 3 - arch/x86/kvm/x86.c | 31 ++++--- crypto/Kconfig | 2 - crypto/drbg.c | 110 ++++++++++--------------- crypto/ecrdsa.c | 8 +- drivers/bluetooth/hci_qca.c | 4 +- drivers/char/random.c | 2 - drivers/char/tpm/tpm2-cmd.c | 11 ++- drivers/char/tpm/tpm_ibmvtpm.c | 1 + drivers/crypto/caam/ctrl.c | 18 ++++ drivers/gpu/drm/i915/intel_pm.c | 2 +- drivers/hid/hid-ids.h | 1 + drivers/hid/hid-multitouch.c | 9 ++ drivers/i2c/busses/i2c-ismt.c | 17 ++++ drivers/i2c/busses/i2c-thunderx-pcidrv.c | 1 + drivers/md/dm-crypt.c | 14 +++- drivers/md/dm-integrity.c | 2 - drivers/md/dm-stats.c | 8 ++ drivers/md/dm-verity-target.c | 1 + drivers/md/raid5.c | 47 +++++------ drivers/media/i2c/imx412.c | 8 +- drivers/net/ethernet/faraday/ftgmac100.c | 5 ++ drivers/net/ipa/ipa_endpoint.c | 4 +- drivers/nfc/pn533/pn533.c | 5 +- drivers/pinctrl/sunxi/pinctrl-suniv-f1c100s.c | 2 +- fs/exfat/balloc.c | 8 +- fs/exfat/exfat_fs.h | 8 ++ fs/exfat/fatent.c | 8 -- fs/exfat/namei.c | 27 +----- fs/nfs/internal.h | 1 + fs/nfsd/nfs4state.c | 12 +-- fs/ntfs3/super.c | 10 ++- fs/pipe.c | 33 ++++---- include/crypto/drbg.h | 10 ++- include/linux/pipe_fs_i.h | 2 +- include/net/netfilter/nf_conntrack_core.h | 7 +- kernel/bpf/stackmap.c | 1 - kernel/bpf/trampoline.c | 18 ++-- kernel/bpf/verifier.c | 17 +++- lib/Kconfig | 2 + lib/assoc_array.c | 8 ++ lib/crypto/Kconfig | 17 ++-- lib/percpu-refcount.c | 1 + mm/zsmalloc.c | 37 ++++++++- net/core/filter.c | 4 +- net/key/af_key.c | 6 +- net/netfilter/nf_tables_api.c | 94 +++++++++++++++------ sound/usb/clock.c | 7 ++ sound/usb/pcm.c | 17 ++-- sound/usb/quirks-table.h | 3 + sound/usb/quirks.c | 2 + 62 files changed, 583 insertions(+), 296 deletions(-)
From: Forest Crossman cyrozap@gmail.com
[ Upstream commit d7be213849232a2accb219d537edf056d29186b4 ]
This device doesn't support reading the sample rate, so we need to apply this quirk to avoid a 15-second delay waiting for three timeouts.
Signed-off-by: Forest Crossman cyrozap@gmail.com Link: https://lore.kernel.org/r/20220504002444.114011-2-cyrozap@gmail.com Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Sasha Levin sashal@kernel.org --- sound/usb/quirks.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/sound/usb/quirks.c b/sound/usb/quirks.c index ab9f3da49941..fbbe59054c3f 100644 --- a/sound/usb/quirks.c +++ b/sound/usb/quirks.c @@ -1822,6 +1822,8 @@ static const struct usb_audio_quirk_flags_table quirk_flags_table[] = { QUIRK_FLAG_IGNORE_CTL_ERROR), DEVICE_FLG(0x06f8, 0xd002, /* Hercules DJ Console (Macintosh Edition) */ QUIRK_FLAG_IGNORE_CTL_ERROR), + DEVICE_FLG(0x0711, 0x5800, /* MCT Trigger 5 USB-to-HDMI */ + QUIRK_FLAG_GET_SAMPLE_RATE), DEVICE_FLG(0x074d, 0x3553, /* Outlaw RR2150 (Micronas UAC3553B) */ QUIRK_FLAG_GET_SAMPLE_RATE), DEVICE_FLG(0x08bb, 0x2702, /* LineX FM Transmitter */
From: IotaHydrae writeforever@foxmail.com
[ Upstream commit fa8785e5931367e2b43f2c507f26bcf3e281c0ca ]
Change suniv f1c100s pinctrl,PD14 multiplexing function lvds1 to uart2
When the pin PD13 and PD14 is setting up to uart2 function in dts, there's an error occurred: 1c20800.pinctrl: unsupported function uart2 on pin PD14
Because 'uart2' is not any one multiplexing option of PD14, and pinctrl don't know how to configure it.
So change the pin PD14 lvds1 function to uart2.
Signed-off-by: IotaHydrae writeforever@foxmail.com Reviewed-by: Andre Przywara andre.przywara@arm.com Link: https://lore.kernel.org/r/tencent_70C1308DDA794C81CAEF389049055BACEC09@qq.co... Signed-off-by: Linus Walleij linus.walleij@linaro.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/pinctrl/sunxi/pinctrl-suniv-f1c100s.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/pinctrl/sunxi/pinctrl-suniv-f1c100s.c b/drivers/pinctrl/sunxi/pinctrl-suniv-f1c100s.c index 2801ca706273..68a5b627fb9b 100644 --- a/drivers/pinctrl/sunxi/pinctrl-suniv-f1c100s.c +++ b/drivers/pinctrl/sunxi/pinctrl-suniv-f1c100s.c @@ -204,7 +204,7 @@ static const struct sunxi_desc_pin suniv_f1c100s_pins[] = { SUNXI_FUNCTION(0x0, "gpio_in"), SUNXI_FUNCTION(0x1, "gpio_out"), SUNXI_FUNCTION(0x2, "lcd"), /* D20 */ - SUNXI_FUNCTION(0x3, "lvds1"), /* RX */ + SUNXI_FUNCTION(0x3, "uart2"), /* RX */ SUNXI_FUNCTION_IRQ_BANK(0x6, 0, 14)), SUNXI_PIN(SUNXI_PINCTRL_PIN(D, 15), SUNXI_FUNCTION(0x0, "gpio_in"),
From: Quentin Perret qperret@google.com
[ Upstream commit 2e40316753ee552fb598e8da8ca0d20a04e67453 ]
Will reported the following splat when running with Protected KVM enabled:
[ 2.427181] ------------[ cut here ]------------ [ 2.427668] WARNING: CPU: 3 PID: 1 at arch/arm64/kvm/mmu.c:489 __create_hyp_private_mapping+0x118/0x1ac [ 2.428424] Modules linked in: [ 2.429040] CPU: 3 PID: 1 Comm: swapper/0 Not tainted 5.18.0-rc2-00084-g8635adc4efc7 #1 [ 2.429589] Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015 [ 2.430286] pstate: 80000005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 2.430734] pc : __create_hyp_private_mapping+0x118/0x1ac [ 2.431091] lr : create_hyp_exec_mappings+0x40/0x80 [ 2.431377] sp : ffff80000803baf0 [ 2.431597] x29: ffff80000803bb00 x28: 0000000000000000 x27: 0000000000000000 [ 2.432156] x26: 0000000000000000 x25: 0000000000000000 x24: 0000000000000000 [ 2.432561] x23: ffffcd96c343b000 x22: 0000000000000000 x21: ffff80000803bb40 [ 2.433004] x20: 0000000000000004 x19: 0000000000001800 x18: 0000000000000000 [ 2.433343] x17: 0003e68cf7efdd70 x16: 0000000000000004 x15: fffffc81f602a2c8 [ 2.434053] x14: ffffdf8380000000 x13: ffffcd9573200000 x12: ffffcd96c343b000 [ 2.434401] x11: 0000000000000004 x10: ffffcd96c1738000 x9 : 0000000000000004 [ 2.434812] x8 : ffff80000803bb40 x7 : 7f7f7f7f7f7f7f7f x6 : 544f422effff306b [ 2.435136] x5 : 000000008020001e x4 : ffff207d80a88c00 x3 : 0000000000000005 [ 2.435480] x2 : 0000000000001800 x1 : 000000014f4ab800 x0 : 000000000badca11 [ 2.436149] Call trace: [ 2.436600] __create_hyp_private_mapping+0x118/0x1ac [ 2.437576] create_hyp_exec_mappings+0x40/0x80 [ 2.438180] kvm_init_vector_slots+0x180/0x194 [ 2.458941] kvm_arch_init+0x80/0x274 [ 2.459220] kvm_init+0x48/0x354 [ 2.459416] arm_init+0x20/0x2c [ 2.459601] do_one_initcall+0xbc/0x238 [ 2.459809] do_initcall_level+0x94/0xb4 [ 2.460043] do_initcalls+0x54/0x94 [ 2.460228] do_basic_setup+0x1c/0x28 [ 2.460407] kernel_init_freeable+0x110/0x178 [ 2.460610] kernel_init+0x20/0x1a0 [ 2.460817] ret_from_fork+0x10/0x20 [ 2.461274] ---[ end trace 0000000000000000 ]---
Indeed, the Protected KVM mode promotes __create_hyp_private_mapping() to a hypercall as EL1 no longer has access to the hypervisor's stage-1 page-table. However, the call from kvm_init_vector_slots() happens after pKVM has been initialized on the primary CPU, but before it has been initialized on secondaries. As such, if the KVM initcall procedure is migrated from one CPU to another in this window, the hypercall may end up running on a CPU for which EL2 has not been initialized.
Fortunately, the pKVM hypervisor doesn't rely on the host to re-map the vectors in the private range, so the hypercall in question is in fact superfluous. Skip it when pKVM is enabled.
Reported-by: Will Deacon will@kernel.org Signed-off-by: Quentin Perret qperret@google.com [maz: simplified the checks slightly] Signed-off-by: Marc Zyngier maz@kernel.org Link: https://lore.kernel.org/r/20220513092607.35233-1-qperret@google.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm64/kvm/arm.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c index 0b2f684cd8ca..a30c036577a3 100644 --- a/arch/arm64/kvm/arm.c +++ b/arch/arm64/kvm/arm.c @@ -1458,7 +1458,8 @@ static int kvm_init_vector_slots(void) base = kern_hyp_va(kvm_ksym_ref(__bp_harden_hyp_vecs)); kvm_init_vector_slot(base, HYP_VECTOR_SPECTRE_DIRECT);
- if (kvm_system_needs_idmapped_vectors() && !has_vhe()) { + if (kvm_system_needs_idmapped_vectors() && + !is_protected_kvm_enabled()) { err = create_hyp_exec_mappings(__pa_symbol(__bp_harden_hyp_vecs), __BP_HARDEN_HYP_VECS_SZ, &base); if (err)
From: Al Viro viro@zeniv.linux.org.uk
[ Upstream commit a91714312eb16f9ecd1f7f8b3efe1380075f28d4 ]
That way percpu_ref_exit() is safe after failing percpu_ref_init(). At least one user (cgroup_create()) had a double-free that way; there might be other similar bugs. Easier to fix in percpu_ref_init(), rather than playing whack-a-mole in sloppy users...
Usual symptoms look like a messed refcounting in one of subsystems that use percpu allocations (might be percpu-refcount, might be something else). Having refcounts for two different objects share memory is Not Nice(tm)...
Reported-by: syzbot+5b1e53987f858500ec00@syzkaller.appspotmail.com Signed-off-by: Al Viro viro@zeniv.linux.org.uk Signed-off-by: Sasha Levin sashal@kernel.org --- lib/percpu-refcount.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/lib/percpu-refcount.c b/lib/percpu-refcount.c index af9302141bcf..e5c5315da274 100644 --- a/lib/percpu-refcount.c +++ b/lib/percpu-refcount.c @@ -76,6 +76,7 @@ int percpu_ref_init(struct percpu_ref *ref, percpu_ref_func_t *release, data = kzalloc(sizeof(*ref->data), gfp); if (!data) { free_percpu((void __percpu *)ref->percpu_count_ptr); + ref->percpu_count_ptr = 0; return -ENOMEM; }
From: Thomas Bartschies thomas.bartschies@cvk.de
[ Upstream commit 015c44d7bff3f44d569716117becd570c179ca32 ]
Since the recent introduction supporting the SM3 and SM4 hash algos for IPsec, the kernel produces invalid pfkey acquire messages, when these encryption modules are disabled. This happens because the availability of the algos wasn't checked in all necessary functions. This patch adds these checks.
Signed-off-by: Thomas Bartschies thomas.bartschies@cvk.de Signed-off-by: Steffen Klassert steffen.klassert@secunet.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/key/af_key.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/net/key/af_key.c b/net/key/af_key.c index 92e9d75dba2f..339d95df19d3 100644 --- a/net/key/af_key.c +++ b/net/key/af_key.c @@ -2900,7 +2900,7 @@ static int count_ah_combs(const struct xfrm_tmpl *t) break; if (!aalg->pfkey_supported) continue; - if (aalg_tmpl_set(t, aalg)) + if (aalg_tmpl_set(t, aalg) && aalg->available) sz += sizeof(struct sadb_comb); } return sz + sizeof(struct sadb_prop); @@ -2918,7 +2918,7 @@ static int count_esp_combs(const struct xfrm_tmpl *t) if (!ealg->pfkey_supported) continue;
- if (!(ealg_tmpl_set(t, ealg))) + if (!(ealg_tmpl_set(t, ealg) && ealg->available)) continue;
for (k = 1; ; k++) { @@ -2929,7 +2929,7 @@ static int count_esp_combs(const struct xfrm_tmpl *t) if (!aalg->pfkey_supported) continue;
- if (aalg_tmpl_set(t, aalg)) + if (aalg_tmpl_set(t, aalg) && aalg->available) sz += sizeof(struct sadb_comb); } }
From: Lin Ma linma@zju.edu.cn
[ Upstream commit b8cedb7093b2d1394cae9b86494cba4b62d3a30a ]
When removing the pn533 device (i2c or USB), there is a logic error. The original code first cancels the worker (flush_delayed_work) and then destroys the workqueue (destroy_workqueue), leaving the timer the last one to be deleted (del_timer). This result in a possible race condition in a multi-core preempt-able kernel. That is, if the cleanup (pn53x_common_clean) is concurrently run with the timer handler (pn533_listen_mode_timer), the timer can queue the poll_work to the already destroyed workqueue, causing use-after-free.
This patch reorder the cleanup: it uses the del_timer_sync to make sure the handler is finished before the routine will destroy the workqueue. Note that the timer cannot be activated by the worker again.
static void pn533_wq_poll(struct work_struct *work) ... rc = pn533_send_poll_frame(dev); if (rc) return;
if (cur_mod->len == 0 && dev->poll_mod_count > 1) mod_timer(&dev->listen_timer, ...);
That is, the mod_timer can be called only when pn533_send_poll_frame() returns no error, which is impossible because the device is detaching and the lower driver should return ENODEV code.
Signed-off-by: Lin Ma linma@zju.edu.cn Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/nfc/pn533/pn533.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/nfc/pn533/pn533.c b/drivers/nfc/pn533/pn533.c index d32aec0c334f..6dc0af63440f 100644 --- a/drivers/nfc/pn533/pn533.c +++ b/drivers/nfc/pn533/pn533.c @@ -2789,13 +2789,14 @@ void pn53x_common_clean(struct pn533 *priv) { struct pn533_cmd *cmd, *n;
+ /* delete the timer before cleanup the worker */ + del_timer_sync(&priv->listen_timer); + flush_delayed_work(&priv->poll_work); destroy_workqueue(priv->wq);
skb_queue_purge(&priv->resp_q);
- del_timer(&priv->listen_timer); - list_for_each_entry_safe(cmd, n, &priv->cmd_queue, queue) { list_del(&cmd->queue); kfree(cmd);
From: Joel Stanley joel@jms.id.au
[ Upstream commit 6fd45e79e8b93b8d22fb8fe22c32fbad7e9190bd ]
The AST2600 when using the i210 NIC over NC-SI has been observed to produce incorrect checksum results with specific MTU values. This was first observed when sending data across a long distance set of networks.
On a local network, the following test was performed using a 1MB file of random data.
On the receiver run this script:
#!/bin/bash while [ 1 ]; do # Zero the stats nstat -r > /dev/null nc -l 9899 > test-file # Check for checksum errors TcpInCsumErrors=$(nstat | grep TcpInCsumErrors) if [ -z "$TcpInCsumErrors" ]; then echo No TcpInCsumErrors else echo TcpInCsumErrors = $TcpInCsumErrors fi done
On an AST2600 system:
# nc <IP of receiver host> 9899 < test-file
The test was repeated with various MTU values:
# ip link set mtu 1410 dev eth0
The observed results:
1500 - good 1434 - bad 1400 - good 1410 - bad 1420 - good
The test was repeated after disabling tx checksumming:
# ethtool -K eth0 tx-checksumming off
And all MTU values tested resulted in transfers without error.
An issue with the driver cannot be ruled out, however there has been no bug discovered so far.
David has done the work to take the original bug report of slow data transfer between long distance connections and triaged it down to this test case.
The vendor suspects this this is a hardware issue when using NC-SI. The fixes line refers to the patch that introduced AST2600 support.
Reported-by: David Wilder wilder@us.ibm.com Reviewed-by: Dylan Hung dylan_hung@aspeedtech.com Signed-off-by: Joel Stanley joel@jms.id.au Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/faraday/ftgmac100.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/drivers/net/ethernet/faraday/ftgmac100.c b/drivers/net/ethernet/faraday/ftgmac100.c index e1df2dc810a2..0b833572205f 100644 --- a/drivers/net/ethernet/faraday/ftgmac100.c +++ b/drivers/net/ethernet/faraday/ftgmac100.c @@ -1910,6 +1910,11 @@ static int ftgmac100_probe(struct platform_device *pdev) /* AST2400 doesn't have working HW checksum generation */ if (np && (of_device_is_compatible(np, "aspeed,ast2400-mac"))) netdev->hw_features &= ~NETIF_F_HW_CSUM; + + /* AST2600 tx checksum with NCSI is broken */ + if (priv->use_ncsi && of_device_is_compatible(np, "aspeed,ast2600-mac")) + netdev->hw_features &= ~NETIF_F_HW_CSUM; + if (np && of_get_property(np, "no-hw-checksum", NULL)) netdev->hw_features &= ~(NETIF_F_HW_CSUM | NETIF_F_RXCSUM); netdev->features |= netdev->hw_features;
From: Mika Westerberg mika.westerberg@linux.intel.com
[ Upstream commit 17a0f3acdc6ec8b89ad40f6e22165a4beee25663 ]
Before sending a MSI the hardware writes information pertinent to the interrupt cause to a memory location pointed by SMTICL register. This memory holds three double words where the least significant bit tells whether the interrupt cause of master/target/error is valid. The driver does not use this but we need to set it up because otherwise it will perform DMA write to the default address (0) and this will cause an IOMMU fault such as below:
DMAR: DRHD: handling fault status reg 2 DMAR: [DMA Write] Request device [00:12.0] PASID ffffffff fault addr 0 [fault reason 05] PTE Write access is not set
To prevent this from happening, provide a proper DMA buffer for this that then gets mapped by the IOMMU accordingly.
Signed-off-by: Mika Westerberg mika.westerberg@linux.intel.com Reviewed-by: From: Andy Shevchenko andriy.shevchenko@linux.intel.com Signed-off-by: Wolfram Sang wsa@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/i2c/busses/i2c-ismt.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+)
diff --git a/drivers/i2c/busses/i2c-ismt.c b/drivers/i2c/busses/i2c-ismt.c index a6187cbec2c9..af2c240e064e 100644 --- a/drivers/i2c/busses/i2c-ismt.c +++ b/drivers/i2c/busses/i2c-ismt.c @@ -82,6 +82,7 @@
#define ISMT_DESC_ENTRIES 2 /* number of descriptor entries */ #define ISMT_MAX_RETRIES 3 /* number of SMBus retries to attempt */ +#define ISMT_LOG_ENTRIES 3 /* number of interrupt cause log entries */
/* Hardware Descriptor Constants - Control Field */ #define ISMT_DESC_CWRL 0x01 /* Command/Write Length */ @@ -175,6 +176,8 @@ struct ismt_priv { u8 head; /* ring buffer head pointer */ struct completion cmp; /* interrupt completion */ u8 buffer[I2C_SMBUS_BLOCK_MAX + 16]; /* temp R/W data buffer */ + dma_addr_t log_dma; + u32 *log; };
static const struct pci_device_id ismt_ids[] = { @@ -411,6 +414,9 @@ static int ismt_access(struct i2c_adapter *adap, u16 addr, memset(desc, 0, sizeof(struct ismt_desc)); desc->tgtaddr_rw = ISMT_DESC_ADDR_RW(addr, read_write);
+ /* Always clear the log entries */ + memset(priv->log, 0, ISMT_LOG_ENTRIES * sizeof(u32)); + /* Initialize common control bits */ if (likely(pci_dev_msi_enabled(priv->pci_dev))) desc->control = ISMT_DESC_INT | ISMT_DESC_FAIR; @@ -708,6 +714,8 @@ static void ismt_hw_init(struct ismt_priv *priv) /* initialize the Master Descriptor Base Address (MDBA) */ writeq(priv->io_rng_dma, priv->smba + ISMT_MSTR_MDBA);
+ writeq(priv->log_dma, priv->smba + ISMT_GR_SMTICL); + /* initialize the Master Control Register (MCTRL) */ writel(ISMT_MCTRL_MEIE, priv->smba + ISMT_MSTR_MCTRL);
@@ -795,6 +803,12 @@ static int ismt_dev_init(struct ismt_priv *priv) priv->head = 0; init_completion(&priv->cmp);
+ priv->log = dmam_alloc_coherent(&priv->pci_dev->dev, + ISMT_LOG_ENTRIES * sizeof(u32), + &priv->log_dma, GFP_KERNEL); + if (!priv->log) + return -ENOMEM; + return 0; }
From: Piyush Malgujar pmalgujar@marvell.com
[ Upstream commit 03a35bc856ddc09f2cc1f4701adecfbf3b464cb3 ]
Due to i2c->adap.dev.fwnode not being set, ACPI_COMPANION() wasn't properly found for TWSI controllers.
Signed-off-by: Szymon Balcerak sbalcerak@marvell.com Signed-off-by: Piyush Malgujar pmalgujar@marvell.com Signed-off-by: Wolfram Sang wsa@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/i2c/busses/i2c-thunderx-pcidrv.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/i2c/busses/i2c-thunderx-pcidrv.c b/drivers/i2c/busses/i2c-thunderx-pcidrv.c index 12c90aa0900e..a77cd86fe75e 100644 --- a/drivers/i2c/busses/i2c-thunderx-pcidrv.c +++ b/drivers/i2c/busses/i2c-thunderx-pcidrv.c @@ -213,6 +213,7 @@ static int thunder_i2c_probe_pci(struct pci_dev *pdev, i2c->adap.bus_recovery_info = &octeon_i2c_recovery_info; i2c->adap.dev.parent = dev; i2c->adap.dev.of_node = pdev->dev.of_node; + i2c->adap.dev.fwnode = dev->fwnode; snprintf(i2c->adap.name, sizeof(i2c->adap.name), "Cavium ThunderX i2c adapter at %s", dev_name(dev)); i2c_set_adapdata(&i2c->adap, i2c);
From: Pablo Neira Ayuso pablo@netfilter.org
commit 520778042ccca019f3ffa136dd0ca565c486cedd upstream.
Since 3e135cd499bf ("netfilter: nft_dynset: dynamic stateful expression instantiation"), it is possible to attach stateful expressions to set elements.
cd5125d8f518 ("netfilter: nf_tables: split set destruction in deactivate and destroy phase") introduces conditional destruction on the object to accomodate transaction semantics.
nft_expr_init() calls expr->ops->init() first, then check for NFT_STATEFUL_EXPR, this stills allows to initialize a non-stateful lookup expressions which points to a set, which might lead to UAF since the set is not properly detached from the set->binding for this case. Anyway, this combination is non-sense from nf_tables perspective.
This patch fixes this problem by checking for NFT_STATEFUL_EXPR before expr->ops->init() is called.
The reporter provides a KASAN splat and a poc reproducer (similar to those autogenerated by syzbot to report use-after-free errors). It is unknown to me if they are using syzbot or if they use similar automated tool to locate the bug that they are reporting.
For the record, this is the KASAN splat.
[ 85.431824] ================================================================== [ 85.432901] BUG: KASAN: use-after-free in nf_tables_bind_set+0x81b/0xa20 [ 85.433825] Write of size 8 at addr ffff8880286f0e98 by task poc/776 [ 85.434756] [ 85.434999] CPU: 1 PID: 776 Comm: poc Tainted: G W 5.18.0+ #2 [ 85.436023] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
Fixes: 0b2d8a7b638b ("netfilter: nf_tables: add helper functions for expression handling") Reported-and-tested-by: Aaron Adams edg-e@nccgroup.com Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/netfilter/nf_tables_api.c | 19 ++++++++++--------- 1 file changed, 10 insertions(+), 9 deletions(-)
--- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -2778,27 +2778,31 @@ static struct nft_expr *nft_expr_init(co
err = nf_tables_expr_parse(ctx, nla, &expr_info); if (err < 0) - goto err1; + goto err_expr_parse; + + err = -EOPNOTSUPP; + if (!(expr_info.ops->type->flags & NFT_EXPR_STATEFUL)) + goto err_expr_stateful;
err = -ENOMEM; expr = kzalloc(expr_info.ops->size, GFP_KERNEL); if (expr == NULL) - goto err2; + goto err_expr_stateful;
err = nf_tables_newexpr(ctx, &expr_info, expr); if (err < 0) - goto err3; + goto err_expr_new;
return expr; -err3: +err_expr_new: kfree(expr); -err2: +err_expr_stateful: owner = expr_info.ops->type->owner; if (expr_info.ops->type->release_ops) expr_info.ops->type->release_ops(expr_info.ops);
module_put(owner); -err1: +err_expr_parse: return ERR_PTR(err); }
@@ -5318,9 +5322,6 @@ struct nft_expr *nft_set_elem_expr_alloc return expr;
err = -EOPNOTSUPP; - if (!(expr->ops->type->flags & NFT_EXPR_STATEFUL)) - goto err_set_elem_expr; - if (expr->ops->type->flags & NFT_EXPR_GC) { if (set->flags & NFT_SET_TIMEOUT) goto err_set_elem_expr;
From: Dan Carpenter dan.carpenter@oracle.com
commit 690b2549b19563ec5ad53e5c82f6a944d910086e upstream.
The "data->block[0]" variable comes from the user and is a number between 0-255. It needs to be capped to prevent writing beyond the end of dma_buffer[].
Fixes: 5e9a97b1f449 ("i2c: ismt: Adding support for I2C_SMBUS_BLOCK_PROC_CALL") Reported-and-tested-by: Zheyu Ma zheyuma97@gmail.com Signed-off-by: Dan Carpenter dan.carpenter@oracle.com Reviewed-by: Mika Westerberg mika.westerberg@linux.intel.com Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/i2c/busses/i2c-ismt.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/drivers/i2c/busses/i2c-ismt.c +++ b/drivers/i2c/busses/i2c-ismt.c @@ -528,6 +528,9 @@ static int ismt_access(struct i2c_adapte
case I2C_SMBUS_BLOCK_PROC_CALL: dev_dbg(dev, "I2C_SMBUS_BLOCK_PROC_CALL\n"); + if (data->block[0] > I2C_SMBUS_BLOCK_MAX) + return -EINVAL; + dma_size = I2C_SMBUS_BLOCK_MAX; desc->tgtaddr_rw = ISMT_DESC_ADDR_RW(addr, 1); desc->wr_len_cmd = data->block[0] + 1;
From: Stephen Brennan stephen.s.brennan@oracle.com
commit d1dc87763f406d4e67caf16dbe438a5647692395 upstream.
A rare BUG_ON triggered in assoc_array_gc:
[3430308.818153] kernel BUG at lib/assoc_array.c:1609!
Which corresponded to the statement currently at line 1593 upstream:
BUG_ON(assoc_array_ptr_is_meta(p));
Using the data from the core dump, I was able to generate a userspace reproducer[1] and determine the cause of the bug.
[1]: https://github.com/brenns10/kernel_stuff/tree/master/assoc_array_gc
After running the iterator on the entire branch, an internal tree node looked like the following:
NODE (nr_leaves_on_branch: 3) SLOT [0] NODE (2 leaves) SLOT [1] NODE (1 leaf) SLOT [2..f] NODE (empty)
In the userspace reproducer, the pr_devel output when compressing this node was:
-- compress node 0x5607cc089380 -- free=0, leaves=0 [0] retain node 2/1 [nx 0] [1] fold node 1/1 [nx 0] [2] fold node 0/1 [nx 2] [3] fold node 0/2 [nx 2] [4] fold node 0/3 [nx 2] [5] fold node 0/4 [nx 2] [6] fold node 0/5 [nx 2] [7] fold node 0/6 [nx 2] [8] fold node 0/7 [nx 2] [9] fold node 0/8 [nx 2] [10] fold node 0/9 [nx 2] [11] fold node 0/10 [nx 2] [12] fold node 0/11 [nx 2] [13] fold node 0/12 [nx 2] [14] fold node 0/13 [nx 2] [15] fold node 0/14 [nx 2] after: 3
At slot 0, an internal node with 2 leaves could not be folded into the node, because there was only one available slot (slot 0). Thus, the internal node was retained. At slot 1, the node had one leaf, and was able to be folded in successfully. The remaining nodes had no leaves, and so were removed. By the end of the compression stage, there were 14 free slots, and only 3 leaf nodes. The tree was ascended and then its parent node was compressed. When this node was seen, it could not be folded, due to the internal node it contained.
The invariant for compression in this function is: whenever nr_leaves_on_branch < ASSOC_ARRAY_FAN_OUT, the node should contain all leaf nodes. The compression step currently cannot guarantee this, given the corner case shown above.
To fix this issue, retry compression whenever we have retained a node, and yet nr_leaves_on_branch < ASSOC_ARRAY_FAN_OUT. This second compression will then allow the node in slot 1 to be folded in, satisfying the invariant. Below is the output of the reproducer once the fix is applied:
-- compress node 0x560e9c562380 -- free=0, leaves=0 [0] retain node 2/1 [nx 0] [1] fold node 1/1 [nx 0] [2] fold node 0/1 [nx 2] [3] fold node 0/2 [nx 2] [4] fold node 0/3 [nx 2] [5] fold node 0/4 [nx 2] [6] fold node 0/5 [nx 2] [7] fold node 0/6 [nx 2] [8] fold node 0/7 [nx 2] [9] fold node 0/8 [nx 2] [10] fold node 0/9 [nx 2] [11] fold node 0/10 [nx 2] [12] fold node 0/11 [nx 2] [13] fold node 0/12 [nx 2] [14] fold node 0/13 [nx 2] [15] fold node 0/14 [nx 2] internal nodes remain despite enough space, retrying -- compress node 0x560e9c562380 -- free=14, leaves=1 [0] fold node 2/15 [nx 0] after: 3
Changes ======= DH: - Use false instead of 0. - Reorder the inserted lines in a couple of places to put retained before next_slot.
ver #2) - Fix typo in pr_devel, correct comparison to "<="
Fixes: 3cb989501c26 ("Add a generic associative array implementation.") Cc: stable@vger.kernel.org Signed-off-by: Stephen Brennan stephen.s.brennan@oracle.com Signed-off-by: David Howells dhowells@redhat.com cc: Andrew Morton akpm@linux-foundation.org cc: keyrings@vger.kernel.org Link: https://lore.kernel.org/r/20220511225517.407935-1-stephen.s.brennan@oracle.c... # v1 Link: https://lore.kernel.org/r/20220512215045.489140-1-stephen.s.brennan@oracle.c... # v2 Reviewed-by: Jarkko Sakkinen jarkko@kernel.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- lib/assoc_array.c | 8 ++++++++ 1 file changed, 8 insertions(+)
--- a/lib/assoc_array.c +++ b/lib/assoc_array.c @@ -1462,6 +1462,7 @@ int assoc_array_gc(struct assoc_array *a struct assoc_array_ptr *cursor, *ptr; struct assoc_array_ptr *new_root, *new_parent, **new_ptr_pp; unsigned long nr_leaves_on_tree; + bool retained; int keylen, slot, nr_free, next_slot, i;
pr_devel("-->%s()\n", __func__); @@ -1538,6 +1539,7 @@ continue_node: goto descend; }
+retry_compress: pr_devel("-- compress node %p --\n", new_n);
/* Count up the number of empty slots in this node and work out the @@ -1555,6 +1557,7 @@ continue_node: pr_devel("free=%d, leaves=%lu\n", nr_free, new_n->nr_leaves_on_branch);
/* See what we can fold in */ + retained = false; next_slot = 0; for (slot = 0; slot < ASSOC_ARRAY_FAN_OUT; slot++) { struct assoc_array_shortcut *s; @@ -1604,9 +1607,14 @@ continue_node: pr_devel("[%d] retain node %lu/%d [nx %d]\n", slot, child->nr_leaves_on_branch, nr_free + 1, next_slot); + retained = true; } }
+ if (retained && new_n->nr_leaves_on_branch <= ASSOC_ARRAY_FAN_OUT) { + pr_devel("internal nodes remain despite enough space, retrying\n"); + goto retry_compress; + } pr_devel("after: %lu\n", new_n->nr_leaves_on_branch);
nr_leaves_on_tree = new_n->nr_leaves_on_branch;
From: Kuniyuki Iwashima kuniyu@amazon.co.jp
commit f485922d8fe4e44f6d52a5bb95a603b7c65554bb upstream.
Patch series "Fix data-races around epoll reported by KCSAN."
This series suppresses a false positive KCSAN's message and fixes a real data-race.
This patch (of 2):
pipe_poll() runs locklessly and assigns 1 to poll_usage. Once poll_usage is set to 1, it never changes in other places. However, concurrent writes of a value trigger KCSAN, so let's make KCSAN happy.
BUG: KCSAN: data-race in pipe_poll / pipe_poll
write to 0xffff8880042f6678 of 4 bytes by task 174 on cpu 3: pipe_poll (fs/pipe.c:656) ep_item_poll.isra.0 (./include/linux/poll.h:88 fs/eventpoll.c:853) do_epoll_wait (fs/eventpoll.c:1692 fs/eventpoll.c:1806 fs/eventpoll.c:2234) __x64_sys_epoll_wait (fs/eventpoll.c:2246 fs/eventpoll.c:2241 fs/eventpoll.c:2241) do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:113)
write to 0xffff8880042f6678 of 4 bytes by task 177 on cpu 1: pipe_poll (fs/pipe.c:656) ep_item_poll.isra.0 (./include/linux/poll.h:88 fs/eventpoll.c:853) do_epoll_wait (fs/eventpoll.c:1692 fs/eventpoll.c:1806 fs/eventpoll.c:2234) __x64_sys_epoll_wait (fs/eventpoll.c:2246 fs/eventpoll.c:2241 fs/eventpoll.c:2241) do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:113)
Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 177 Comm: epoll_race Not tainted 5.17.0-58927-gf443e374ae13 #6 Hardware name: Red Hat KVM, BIOS 1.11.0-2.amzn2 04/01/2014
Link: https://lkml.kernel.org/r/20220322002653.33865-1-kuniyu@amazon.co.jp Link: https://lkml.kernel.org/r/20220322002653.33865-2-kuniyu@amazon.co.jp Fixes: 3b844826b6c6 ("pipe: avoid unnecessary EPOLLET wakeups under normal loads") Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.co.jp Cc: Alexander Duyck alexander.h.duyck@intel.com Cc: Al Viro viro@zeniv.linux.org.uk Cc: Davidlohr Bueso dave@stgolabs.net Cc: Kuniyuki Iwashima kuni1840@gmail.com Cc: "Soheil Hassas Yeganeh" soheil@google.com Cc: "Sridhar Samudrala" sridhar.samudrala@intel.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/pipe.c | 2 +- include/linux/pipe_fs_i.h | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-)
--- a/fs/pipe.c +++ b/fs/pipe.c @@ -652,7 +652,7 @@ pipe_poll(struct file *filp, poll_table unsigned int head, tail;
/* Epoll has some historical nasty semantics, this enables them */ - pipe->poll_usage = 1; + WRITE_ONCE(pipe->poll_usage, true);
/* * Reading pipe state only -- no need for acquiring the semaphore. --- a/include/linux/pipe_fs_i.h +++ b/include/linux/pipe_fs_i.h @@ -71,7 +71,7 @@ struct pipe_inode_info { unsigned int files; unsigned int r_counter; unsigned int w_counter; - unsigned int poll_usage; + bool poll_usage; struct page *tmp_page; struct fasync_struct *fasync_readers; struct fasync_struct *fasync_writers;
From: David Howells dhowells@redhat.com
commit 189b0ddc245139af81198d1a3637cac74f96e13a upstream.
pipe_resize_ring() needs to take the pipe->rd_wait.lock spinlock to prevent post_one_notification() from trying to insert into the ring whilst the ring is being replaced.
The occupancy check must be done after the lock is taken, and the lock must be taken after the new ring is allocated.
The bug can lead to an oops looking something like:
BUG: KASAN: use-after-free in post_one_notification.isra.0+0x62e/0x840 Read of size 4 at addr ffff88801cc72a70 by task poc/27196 ... Call Trace: post_one_notification.isra.0+0x62e/0x840 __post_watch_notification+0x3b7/0x650 key_create_or_update+0xb8b/0xd20 __do_sys_add_key+0x175/0x340 __x64_sys_add_key+0xbe/0x140 do_syscall_64+0x5c/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xae
Reported by Selim Enes Karaduman @Enesdex working with Trend Micro Zero Day Initiative.
Fixes: c73be61cede5 ("pipe: Add general notification queue support") Reported-by: zdi-disclosures@trendmicro.com # ZDI-CAN-17291 Signed-off-by: David Howells dhowells@redhat.com Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/pipe.c | 31 ++++++++++++++++++------------- 1 file changed, 18 insertions(+), 13 deletions(-)
--- a/fs/pipe.c +++ b/fs/pipe.c @@ -1244,30 +1244,33 @@ unsigned int round_pipe_size(unsigned lo
/* * Resize the pipe ring to a number of slots. + * + * Note the pipe can be reduced in capacity, but only if the current + * occupancy doesn't exceed nr_slots; if it does, EBUSY will be + * returned instead. */ int pipe_resize_ring(struct pipe_inode_info *pipe, unsigned int nr_slots) { struct pipe_buffer *bufs; unsigned int head, tail, mask, n;
- /* - * We can shrink the pipe, if arg is greater than the ring occupancy. - * Since we don't expect a lot of shrink+grow operations, just free and - * allocate again like we would do for growing. If the pipe currently - * contains more buffers than arg, then return busy. - */ - mask = pipe->ring_size - 1; - head = pipe->head; - tail = pipe->tail; - n = pipe_occupancy(pipe->head, pipe->tail); - if (nr_slots < n) - return -EBUSY; - bufs = kcalloc(nr_slots, sizeof(*bufs), GFP_KERNEL_ACCOUNT | __GFP_NOWARN); if (unlikely(!bufs)) return -ENOMEM;
+ spin_lock_irq(&pipe->rd_wait.lock); + mask = pipe->ring_size - 1; + head = pipe->head; + tail = pipe->tail; + + n = pipe_occupancy(head, tail); + if (nr_slots < n) { + spin_unlock_irq(&pipe->rd_wait.lock); + kfree(bufs); + return -EBUSY; + } + /* * The pipe array wraps around, so just start the new one at zero * and adjust the indices. @@ -1299,6 +1302,8 @@ int pipe_resize_ring(struct pipe_inode_i pipe->tail = tail; pipe->head = head;
+ spin_unlock_irq(&pipe->rd_wait.lock); + /* This might have made more room for writers */ wake_up_interruptible(&pipe->wr_wait); return 0;
From: Alex Elder elder@linaro.org
commit c5794097b269f15961ed78f7f27b50e51766dec9 upstream.
The aggregation byte limit for an endpoint is currently computed based on the endpoint's receive buffer size.
However, some bytes at the front of each receive buffer are reserved on the assumption that--as with SKBs--it might be useful to insert data (such as headers) before what lands in the buffer.
The aggregation byte limit currently doesn't take into account that reserved space, and as a result, aggregation could require space past that which is available in the buffer.
Fix this by reducing the size used to compute the aggregation byte limit by the NET_SKB_PAD offset reserved for each receive buffer.
Signed-off-by: Alex Elder elder@linaro.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ipa/ipa_endpoint.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/drivers/net/ipa/ipa_endpoint.c +++ b/drivers/net/ipa/ipa_endpoint.c @@ -722,13 +722,15 @@ static void ipa_endpoint_init_aggr(struc
if (endpoint->data->aggregation) { if (!endpoint->toward_ipa) { + u32 buffer_size; bool close_eof; u32 limit;
val |= u32_encode_bits(IPA_ENABLE_AGGR, AGGR_EN_FMASK); val |= u32_encode_bits(IPA_GENERIC, AGGR_TYPE_FMASK);
- limit = ipa_aggr_size_kb(IPA_RX_BUFFER_SIZE); + buffer_size = IPA_RX_BUFFER_SIZE - NET_SKB_PAD; + limit = ipa_aggr_size_kb(buffer_size); val |= aggr_byte_limit_encoded(version, limit);
limit = IPA_AGGR_TIME_LIMIT;
From: Gustavo A. R. Silva gustavoars@kernel.org
commit 336feb502a715909a8136eb6a62a83d7268a353b upstream.
Fix the following -Wstringop-overflow warnings when building with GCC-11:
drivers/gpu/drm/i915/intel_pm.c:3106:9: warning: ‘intel_read_wm_latency’ accessing 16 bytes in a region of size 10 [-Wstringop-overflow=] 3106 | intel_read_wm_latency(dev_priv, dev_priv->wm.pri_latency); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/gpu/drm/i915/intel_pm.c:3106:9: note: referencing argument 2 of type ‘u16 *’ {aka ‘short unsigned int *’} drivers/gpu/drm/i915/intel_pm.c:2861:13: note: in a call to function ‘intel_read_wm_latency’ 2861 | static void intel_read_wm_latency(struct drm_i915_private *dev_priv, | ^~~~~~~~~~~~~~~~~~~~~
by removing the over-specified array size from the argument declarations.
It seems that this code is actually safe because the size of the array depends on the hardware generation, and the function checks for that.
Notice that wm can be an array of 5 elements: drivers/gpu/drm/i915/intel_pm.c:3109: intel_read_wm_latency(dev_priv, dev_priv->wm.pri_latency);
or an array of 8 elements: drivers/gpu/drm/i915/intel_pm.c:3131: intel_read_wm_latency(dev_priv, dev_priv->wm.skl_latency);
and the compiler legitimately complains about that.
This helps with the ongoing efforts to globally enable -Wstringop-overflow.
Link: https://github.com/KSPP/linux/issues/181 Signed-off-by: Gustavo A. R. Silva gustavoars@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/i915/intel_pm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/gpu/drm/i915/intel_pm.c +++ b/drivers/gpu/drm/i915/intel_pm.c @@ -2863,7 +2863,7 @@ static void ilk_compute_wm_level(const s }
static void intel_read_wm_latency(struct drm_i915_private *dev_priv, - u16 wm[8]) + u16 wm[]) { struct intel_uncore *uncore = &dev_priv->uncore;
From: Tadeusz Struk tadeusz.struk@linaro.org
commit 64ba4b15e5c045f8b746c6da5fc9be9a6b00b61d upstream.
Syzbot reported slab-out-of-bounds read in exfat_clear_bitmap. This was triggered by reproducer calling truncute with size 0, which causes the following trace:
BUG: KASAN: slab-out-of-bounds in exfat_clear_bitmap+0x147/0x490 fs/exfat/balloc.c:174 Read of size 8 at addr ffff888115aa9508 by task syz-executor251/365
Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack_lvl+0x1e2/0x24b lib/dump_stack.c:118 print_address_description+0x81/0x3c0 mm/kasan/report.c:233 __kasan_report mm/kasan/report.c:419 [inline] kasan_report+0x1a4/0x1f0 mm/kasan/report.c:436 __asan_report_load8_noabort+0x14/0x20 mm/kasan/report_generic.c:309 exfat_clear_bitmap+0x147/0x490 fs/exfat/balloc.c:174 exfat_free_cluster+0x25a/0x4a0 fs/exfat/fatent.c:181 __exfat_truncate+0x99e/0xe00 fs/exfat/file.c:217 exfat_truncate+0x11b/0x4f0 fs/exfat/file.c:243 exfat_setattr+0xa03/0xd40 fs/exfat/file.c:339 notify_change+0xb76/0xe10 fs/attr.c:336 do_truncate+0x1ea/0x2d0 fs/open.c:65
Move the is_valid_cluster() helper from fatent.c to a common header to make it reusable in other *.c files. And add is_valid_cluster() to validate if cluster number is within valid range in exfat_clear_bitmap() and exfat_set_bitmap().
Link: https://syzkaller.appspot.com/bug?id=50381fc73821ecae743b8cf24b4c9a04776f767... Reported-by: syzbot+a4087e40b9c13aad7892@syzkaller.appspotmail.com Fixes: 1e49a94cf707 ("exfat: add bitmap operations") Cc: stable@vger.kernel.org # v5.7+ Signed-off-by: Tadeusz Struk tadeusz.struk@linaro.org Reviewed-by: Sungjong Seo sj1557.seo@samsung.com Signed-off-by: Namjae Jeon linkinjeon@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/exfat/balloc.c | 8 ++++++-- fs/exfat/exfat_fs.h | 8 ++++++++ fs/exfat/fatent.c | 8 -------- 3 files changed, 14 insertions(+), 10 deletions(-)
--- a/fs/exfat/balloc.c +++ b/fs/exfat/balloc.c @@ -148,7 +148,9 @@ int exfat_set_bitmap(struct inode *inode struct super_block *sb = inode->i_sb; struct exfat_sb_info *sbi = EXFAT_SB(sb);
- WARN_ON(clu < EXFAT_FIRST_CLUSTER); + if (!is_valid_cluster(sbi, clu)) + return -EINVAL; + ent_idx = CLUSTER_TO_BITMAP_ENT(clu); i = BITMAP_OFFSET_SECTOR_INDEX(sb, ent_idx); b = BITMAP_OFFSET_BIT_IN_SECTOR(sb, ent_idx); @@ -166,7 +168,9 @@ void exfat_clear_bitmap(struct inode *in struct exfat_sb_info *sbi = EXFAT_SB(sb); struct exfat_mount_options *opts = &sbi->options;
- WARN_ON(clu < EXFAT_FIRST_CLUSTER); + if (!is_valid_cluster(sbi, clu)) + return; + ent_idx = CLUSTER_TO_BITMAP_ENT(clu); i = BITMAP_OFFSET_SECTOR_INDEX(sb, ent_idx); b = BITMAP_OFFSET_BIT_IN_SECTOR(sb, ent_idx); --- a/fs/exfat/exfat_fs.h +++ b/fs/exfat/exfat_fs.h @@ -381,6 +381,14 @@ static inline int exfat_sector_to_cluste EXFAT_RESERVED_CLUSTERS; }
+static inline bool is_valid_cluster(struct exfat_sb_info *sbi, + unsigned int clus) +{ + if (clus < EXFAT_FIRST_CLUSTER || sbi->num_clusters <= clus) + return false; + return true; +} + /* super.c */ int exfat_set_volume_dirty(struct super_block *sb); int exfat_clear_volume_dirty(struct super_block *sb); --- a/fs/exfat/fatent.c +++ b/fs/exfat/fatent.c @@ -81,14 +81,6 @@ int exfat_ent_set(struct super_block *sb return 0; }
-static inline bool is_valid_cluster(struct exfat_sb_info *sbi, - unsigned int clus) -{ - if (clus < EXFAT_FIRST_CLUSTER || sbi->num_clusters <= clus) - return false; - return true; -} - int exfat_ent_get(struct super_block *sb, unsigned int loc, unsigned int *content) {
From: Yuezhang Mo Yuezhang.Mo@sony.com
commit d8dad2588addd1d861ce19e7df3b702330f0c7e3 upstream.
During renaming, the parent directory information maybe updated. But the file/directory still references to the old parent directory information.
This bug will cause 2 problems.
(1) The renamed file can not be written.
[10768.175172] exFAT-fs (sda1): error, failed to bmap (inode : 7afd50e4 iblock : 0, err : -5) [10768.184285] exFAT-fs (sda1): Filesystem has been set read-only ash: write error: Input/output error
(2) Some dentries of the renamed file/directory are not set to deleted after removing the file/directory.
exfat_update_parent_info() is a workaround for the wrong parent directory information being used after renaming. Now that bug is fixed, this is no longer needed, so remove it.
Fixes: 5f2aa075070c ("exfat: add inode operations") Cc: stable@vger.kernel.org # v5.7+ Signed-off-by: Yuezhang Mo Yuezhang.Mo@sony.com Reviewed-by: Andy Wu Andy.Wu@sony.com Reviewed-by: Aoyama Wataru wataru.aoyama@sony.com Reviewed-by: Daniel Palmer daniel.palmer@sony.com Reviewed-by: Sungjong Seo sj1557.seo@samsung.com Signed-off-by: Namjae Jeon linkinjeon@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/exfat/namei.c | 27 +-------------------------- 1 file changed, 1 insertion(+), 26 deletions(-)
--- a/fs/exfat/namei.c +++ b/fs/exfat/namei.c @@ -1069,6 +1069,7 @@ static int exfat_rename_file(struct inod
exfat_remove_entries(inode, p_dir, oldentry, 0, num_old_entries); + ei->dir = *p_dir; ei->entry = newentry; } else { if (exfat_get_entry_type(epold) == TYPE_FILE) { @@ -1159,28 +1160,6 @@ static int exfat_move_file(struct inode return 0; }
-static void exfat_update_parent_info(struct exfat_inode_info *ei, - struct inode *parent_inode) -{ - struct exfat_sb_info *sbi = EXFAT_SB(parent_inode->i_sb); - struct exfat_inode_info *parent_ei = EXFAT_I(parent_inode); - loff_t parent_isize = i_size_read(parent_inode); - - /* - * the problem that struct exfat_inode_info caches wrong parent info. - * - * because of flag-mismatch of ei->dir, - * there is abnormal traversing cluster chain. - */ - if (unlikely(parent_ei->flags != ei->dir.flags || - parent_isize != EXFAT_CLU_TO_B(ei->dir.size, sbi) || - parent_ei->start_clu != ei->dir.dir)) { - exfat_chain_set(&ei->dir, parent_ei->start_clu, - EXFAT_B_TO_CLU_ROUND_UP(parent_isize, sbi), - parent_ei->flags); - } -} - /* rename or move a old file into a new file */ static int __exfat_rename(struct inode *old_parent_inode, struct exfat_inode_info *ei, struct inode *new_parent_inode, @@ -1211,8 +1190,6 @@ static int __exfat_rename(struct inode * return -ENOENT; }
- exfat_update_parent_info(ei, old_parent_inode); - exfat_chain_dup(&olddir, &ei->dir); dentry = ei->entry;
@@ -1233,8 +1210,6 @@ static int __exfat_rename(struct inode * goto out; }
- exfat_update_parent_info(new_ei, new_parent_inode); - p_dir = &(new_ei->dir); new_entry = new_ei->entry; ep = exfat_get_dentry(sb, p_dir, new_entry, &new_bh, NULL);
From: "Justin M. Forbes" jforbes@fedoraproject.org
commit e56e18985596617ae426ed5997fb2e737cffb58b upstream.
Commit 6048fdcc5f269 ("lib/crypto: blake2s: include as built-in") took away a number of prompt texts from other crypto libraries. This makes values flip from built-in to module when oldconfig runs, and causes problems when these crypto libs need to be built in for thingslike BIG_KEYS.
Fixes: 6048fdcc5f269 ("lib/crypto: blake2s: include as built-in") Cc: Herbert Xu herbert@gondor.apana.org.au Cc: linux-crypto@vger.kernel.org Signed-off-by: Justin M. Forbes jforbes@fedoraproject.org [Jason: - moved menu into submenu of lib/ instead of root menu - fixed chacha sub-dependencies for CONFIG_CRYPTO] Signed-off-by: Jason A. Donenfeld Jason@zx2c4.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- crypto/Kconfig | 2 -- lib/Kconfig | 2 ++ lib/crypto/Kconfig | 17 ++++++++++++----- 3 files changed, 14 insertions(+), 7 deletions(-)
--- a/crypto/Kconfig +++ b/crypto/Kconfig @@ -1924,5 +1924,3 @@ source "crypto/asymmetric_keys/Kconfig" source "certs/Kconfig"
endif # if CRYPTO - -source "lib/crypto/Kconfig" --- a/lib/Kconfig +++ b/lib/Kconfig @@ -121,6 +121,8 @@ config INDIRECT_IOMEM_FALLBACK mmio accesses when the IO memory address is not a registered emulated region.
+source "lib/crypto/Kconfig" + config CRC_CCITT tristate "CRC-CCITT functions" help --- a/lib/crypto/Kconfig +++ b/lib/crypto/Kconfig @@ -1,5 +1,7 @@ # SPDX-License-Identifier: GPL-2.0
+menu "Crypto library routines" + config CRYPTO_LIB_AES tristate
@@ -31,7 +33,7 @@ config CRYPTO_ARCH_HAVE_LIB_CHACHA
config CRYPTO_LIB_CHACHA_GENERIC tristate - select CRYPTO_ALGAPI + select XOR_BLOCKS help This symbol can be depended upon by arch implementations of the ChaCha library interface that require the generic code as a @@ -40,7 +42,8 @@ config CRYPTO_LIB_CHACHA_GENERIC of CRYPTO_LIB_CHACHA.
config CRYPTO_LIB_CHACHA - tristate + tristate "ChaCha library interface" + depends on CRYPTO depends on CRYPTO_ARCH_HAVE_LIB_CHACHA || !CRYPTO_ARCH_HAVE_LIB_CHACHA select CRYPTO_LIB_CHACHA_GENERIC if CRYPTO_ARCH_HAVE_LIB_CHACHA=n help @@ -65,7 +68,7 @@ config CRYPTO_LIB_CURVE25519_GENERIC of CRYPTO_LIB_CURVE25519.
config CRYPTO_LIB_CURVE25519 - tristate + tristate "Curve25519 scalar multiplication library" depends on CRYPTO_ARCH_HAVE_LIB_CURVE25519 || !CRYPTO_ARCH_HAVE_LIB_CURVE25519 select CRYPTO_LIB_CURVE25519_GENERIC if CRYPTO_ARCH_HAVE_LIB_CURVE25519=n help @@ -100,7 +103,7 @@ config CRYPTO_LIB_POLY1305_GENERIC of CRYPTO_LIB_POLY1305.
config CRYPTO_LIB_POLY1305 - tristate + tristate "Poly1305 library interface" depends on CRYPTO_ARCH_HAVE_LIB_POLY1305 || !CRYPTO_ARCH_HAVE_LIB_POLY1305 select CRYPTO_LIB_POLY1305_GENERIC if CRYPTO_ARCH_HAVE_LIB_POLY1305=n help @@ -109,14 +112,18 @@ config CRYPTO_LIB_POLY1305 is available and enabled.
config CRYPTO_LIB_CHACHA20POLY1305 - tristate + tristate "ChaCha20-Poly1305 AEAD support (8-byte nonce library version)" depends on CRYPTO_ARCH_HAVE_LIB_CHACHA || !CRYPTO_ARCH_HAVE_LIB_CHACHA depends on CRYPTO_ARCH_HAVE_LIB_POLY1305 || !CRYPTO_ARCH_HAVE_LIB_POLY1305 + depends on CRYPTO select CRYPTO_LIB_CHACHA select CRYPTO_LIB_POLY1305 + select CRYPTO_ALGAPI
config CRYPTO_LIB_SHA256 tristate
config CRYPTO_LIB_SM4 tristate + +endmenu
From: Nicolai Stange nstange@suse.de
commit ce8ce31b2c5c8b18667784b8c515650c65d57b4e upstream.
There are two different randomness sources the DRBGs are getting seeded from, namely the jitterentropy source (if enabled) and get_random_bytes(). At initial DRBG seeding time during boot, the latter might not have collected sufficient entropy for seeding itself yet and thus, the DRBG implementation schedules a reseed work from a random_ready_callback once that has happened. This is particularly important for the !->pr DRBG instances, for which (almost) no further reseeds are getting triggered during their lifetime.
Because collecting data from the jitterentropy source is a rather expensive operation, the aforementioned asynchronously scheduled reseed work restricts itself to get_random_bytes() only. That is, it in some sense amends the initial DRBG seed derived from jitterentropy output at full (estimated) entropy with fresh randomness obtained from get_random_bytes() once that has been seeded with sufficient entropy itself.
With the advent of rng_is_initialized(), there is no real need for doing the reseed operation from an asynchronously scheduled work anymore and a subsequent patch will make it synchronous by moving it next to related logic already present in drbg_generate().
However, for tracking whether a full reseed including the jitterentropy source is required or a "partial" reseed involving only get_random_bytes() would be sufficient already, the boolean struct drbg_state's ->seeded member must become a tristate value.
Prepare for this by introducing the new enum drbg_seed_state and change struct drbg_state's ->seeded member's type from bool to that type.
For facilitating review, enum drbg_seed_state is made to only contain two members corresponding to the former ->seeded values of false and true resp. at this point: DRBG_SEED_STATE_UNSEEDED and DRBG_SEED_STATE_FULL. A third one for tracking the intermediate state of "seeded from jitterentropy only" will be introduced with a subsequent patch.
There is no change in behaviour at this point.
Signed-off-by: Nicolai Stange nstange@suse.de Reviewed-by: Stephan Müller smueller@chronox.de Signed-off-by: Herbert Xu herbert@gondor.apana.org.au Signed-off-by: Jason A. Donenfeld Jason@zx2c4.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- crypto/drbg.c | 19 ++++++++++--------- include/crypto/drbg.h | 7 ++++++- 2 files changed, 16 insertions(+), 10 deletions(-)
--- a/crypto/drbg.c +++ b/crypto/drbg.c @@ -1043,7 +1043,7 @@ static inline int __drbg_seed(struct drb if (ret) return ret;
- drbg->seeded = true; + drbg->seeded = DRBG_SEED_STATE_FULL; /* 10.1.1.2 / 10.1.1.3 step 5 */ drbg->reseed_ctr = 1;
@@ -1088,14 +1088,14 @@ static void drbg_async_seed(struct work_ if (ret) goto unlock;
- /* Set seeded to false so that if __drbg_seed fails the - * next generate call will trigger a reseed. + /* Reset ->seeded so that if __drbg_seed fails the next + * generate call will trigger a reseed. */ - drbg->seeded = false; + drbg->seeded = DRBG_SEED_STATE_UNSEEDED;
__drbg_seed(drbg, &seedlist, true);
- if (drbg->seeded) + if (drbg->seeded == DRBG_SEED_STATE_FULL) drbg->reseed_threshold = drbg_max_requests(drbg);
unlock: @@ -1386,13 +1386,14 @@ static int drbg_generate(struct drbg_sta * here. The spec is a bit convoluted here, we make it simpler. */ if (drbg->reseed_threshold < drbg->reseed_ctr) - drbg->seeded = false; + drbg->seeded = DRBG_SEED_STATE_UNSEEDED;
- if (drbg->pr || !drbg->seeded) { + if (drbg->pr || drbg->seeded == DRBG_SEED_STATE_UNSEEDED) { pr_devel("DRBG: reseeding before generation (prediction " "resistance: %s, state %s)\n", drbg->pr ? "true" : "false", - drbg->seeded ? "seeded" : "unseeded"); + (drbg->seeded == DRBG_SEED_STATE_FULL ? + "seeded" : "unseeded")); /* 9.3.1 steps 7.1 through 7.3 */ len = drbg_seed(drbg, addtl, true); if (len) @@ -1577,7 +1578,7 @@ static int drbg_instantiate(struct drbg_ if (!drbg->core) { drbg->core = &drbg_cores[coreref]; drbg->pr = pr; - drbg->seeded = false; + drbg->seeded = DRBG_SEED_STATE_UNSEEDED; drbg->reseed_threshold = drbg_max_requests(drbg);
ret = drbg_alloc_state(drbg); --- a/include/crypto/drbg.h +++ b/include/crypto/drbg.h @@ -105,6 +105,11 @@ struct drbg_test_data { struct drbg_string *testentropy; /* TEST PARAMETER: test entropy */ };
+enum drbg_seed_state { + DRBG_SEED_STATE_UNSEEDED, + DRBG_SEED_STATE_FULL, +}; + struct drbg_state { struct mutex drbg_mutex; /* lock around DRBG */ unsigned char *V; /* internal state 10.1.1.1 1a) */ @@ -127,7 +132,7 @@ struct drbg_state { struct crypto_wait ctr_wait; /* CTR mode async wait obj */ struct scatterlist sg_in, sg_out; /* CTR mode SGLs */
- bool seeded; /* DRBG fully seeded? */ + enum drbg_seed_state seeded; /* DRBG fully seeded? */ bool pr; /* Prediction resistance enabled? */ bool fips_primed; /* Continuous test primed? */ unsigned char *prev; /* FIPS 140-2 continuous test value */
From: Nicolai Stange nstange@suse.de
commit 2bcd25443868aa8863779a6ebc6c9319633025d2 upstream.
Currently, the DRBG implementation schedules asynchronous works from random_ready_callbacks for reseeding the DRBG instances with output from get_random_bytes() once the latter has sufficient entropy available.
However, as the get_random_bytes() initialization state can get queried by means of rng_is_initialized() now, there is no real need for this asynchronous reseeding logic anymore and it's better to keep things simple by doing it synchronously when needed instead, i.e. from drbg_generate() once rng_is_initialized() has flipped to true.
Of course, for this to work, drbg_generate() would need some means by which it can tell whether or not rng_is_initialized() has flipped to true since the last seeding from get_random_bytes(). Or equivalently, whether or not the last seed from get_random_bytes() has happened when rng_is_initialized() was still evaluating to false.
As it currently stands, enum drbg_seed_state allows for the representation of two different DRBG seeding states: DRBG_SEED_STATE_UNSEEDED and DRBG_SEED_STATE_FULL. The former makes drbg_generate() to invoke a full reseeding operation involving both, the rather expensive jitterentropy as well as the get_random_bytes() randomness sources. The DRBG_SEED_STATE_FULL state on the other hand implies that no reseeding at all is required for a !->pr DRBG variant.
Introduce the new DRBG_SEED_STATE_PARTIAL state to enum drbg_seed_state for representing the condition that a DRBG was being seeded when rng_is_initialized() had still been false. In particular, this new state implies that - the given DRBG instance has been fully seeded from the jitterentropy source (if enabled) - and drbg_generate() is supposed to reseed from get_random_bytes() *only* once rng_is_initialized() turns to true.
Up to now, the __drbg_seed() helper used to set the given DRBG instance's ->seeded state to constant DRBG_SEED_STATE_FULL. Introduce a new argument allowing for the specification of the to be written ->seeded value instead. Make the first of its two callers, drbg_seed(), determine the appropriate value based on rng_is_initialized(). The remaining caller, drbg_async_seed(), is known to get invoked only once rng_is_initialized() is true, hence let it pass constant DRBG_SEED_STATE_FULL for the new argument to __drbg_seed().
There is no change in behaviour, except for that the pr_devel() in drbg_generate() would now report "unseeded" for ->pr DRBG instances which had last been seeded when rng_is_initialized() was still evaluating to false.
Signed-off-by: Nicolai Stange nstange@suse.de Reviewed-by: Stephan Müller smueller@chronox.de Signed-off-by: Herbert Xu herbert@gondor.apana.org.au Signed-off-by: Jason A. Donenfeld Jason@zx2c4.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- crypto/drbg.c | 12 ++++++++---- include/crypto/drbg.h | 1 + 2 files changed, 9 insertions(+), 4 deletions(-)
--- a/crypto/drbg.c +++ b/crypto/drbg.c @@ -1036,14 +1036,14 @@ static const struct drbg_state_ops drbg_ ******************************************************************/
static inline int __drbg_seed(struct drbg_state *drbg, struct list_head *seed, - int reseed) + int reseed, enum drbg_seed_state new_seed_state) { int ret = drbg->d_ops->update(drbg, seed, reseed);
if (ret) return ret;
- drbg->seeded = DRBG_SEED_STATE_FULL; + drbg->seeded = new_seed_state; /* 10.1.1.2 / 10.1.1.3 step 5 */ drbg->reseed_ctr = 1;
@@ -1093,7 +1093,7 @@ static void drbg_async_seed(struct work_ */ drbg->seeded = DRBG_SEED_STATE_UNSEEDED;
- __drbg_seed(drbg, &seedlist, true); + __drbg_seed(drbg, &seedlist, true, DRBG_SEED_STATE_FULL);
if (drbg->seeded == DRBG_SEED_STATE_FULL) drbg->reseed_threshold = drbg_max_requests(drbg); @@ -1123,6 +1123,7 @@ static int drbg_seed(struct drbg_state * unsigned int entropylen = drbg_sec_strength(drbg->core->flags); struct drbg_string data1; LIST_HEAD(seedlist); + enum drbg_seed_state new_seed_state = DRBG_SEED_STATE_FULL;
/* 9.1 / 9.2 / 9.3.1 step 3 */ if (pers && pers->len > (drbg_max_addtl(drbg))) { @@ -1150,6 +1151,9 @@ static int drbg_seed(struct drbg_state * BUG_ON((entropylen * 2) > sizeof(entropy));
/* Get seed from in-kernel /dev/urandom */ + if (!rng_is_initialized()) + new_seed_state = DRBG_SEED_STATE_PARTIAL; + ret = drbg_get_random_bytes(drbg, entropy, entropylen); if (ret) goto out; @@ -1206,7 +1210,7 @@ static int drbg_seed(struct drbg_state * memset(drbg->C, 0, drbg_statelen(drbg)); }
- ret = __drbg_seed(drbg, &seedlist, reseed); + ret = __drbg_seed(drbg, &seedlist, reseed, new_seed_state);
out: memzero_explicit(entropy, entropylen * 2); --- a/include/crypto/drbg.h +++ b/include/crypto/drbg.h @@ -107,6 +107,7 @@ struct drbg_test_data {
enum drbg_seed_state { DRBG_SEED_STATE_UNSEEDED, + DRBG_SEED_STATE_PARTIAL, /* Seeded with !rng_is_initialized() */ DRBG_SEED_STATE_FULL, };
From: Nicolai Stange nstange@suse.de
commit 262d83a4290c331cd4f617a457408bdb82fbb738 upstream.
Since commit 42ea507fae1a ("crypto: drbg - reseed often if seedsource is degraded"), the maximum seed lifetime represented by ->reseed_threshold gets temporarily lowered if the get_random_bytes() source cannot provide sufficient entropy yet, as is common during boot, and restored back to the original value again once that has changed.
More specifically, if the add_random_ready_callback() invoked from drbg_prepare_hrng() in the course of DRBG instantiation does not return -EALREADY, that is, if get_random_bytes() has not been fully initialized at this point yet, drbg_prepare_hrng() will lower ->reseed_threshold to a value of 50. The drbg_async_seed() scheduled from said random_ready_callback will eventually restore the original value.
A future patch will replace the random_ready_callback based notification mechanism and thus, there will be no add_random_ready_callback() return value anymore which could get compared to -EALREADY.
However, there's __drbg_seed() which gets invoked in the course of both, the DRBG instantiation as well as the eventual reseeding from get_random_bytes() in aforementioned drbg_async_seed(), if any. Moreover, it knows about the get_random_bytes() initialization state by the time the seed data had been obtained from it: the new_seed_state argument introduced with the previous patch would get set to DRBG_SEED_STATE_PARTIAL in case get_random_bytes() had not been fully initialized yet and to DRBG_SEED_STATE_FULL otherwise. Thus, __drbg_seed() provides a convenient alternative for managing that ->reseed_threshold lowering and restoring at a central place.
Move all ->reseed_threshold adjustment code from drbg_prepare_hrng() and drbg_async_seed() respectively to __drbg_seed(). Make __drbg_seed() lower the ->reseed_threshold to 50 in case its new_seed_state argument equals DRBG_SEED_STATE_PARTIAL and let it restore the original value otherwise.
There is no change in behaviour.
Signed-off-by: Nicolai Stange nstange@suse.de Reviewed-by: Stephan Müller smueller@chronox.de Signed-off-by: Herbert Xu herbert@gondor.apana.org.au Signed-off-by: Jason A. Donenfeld Jason@zx2c4.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- crypto/drbg.c | 30 +++++++++++++++++++++--------- 1 file changed, 21 insertions(+), 9 deletions(-)
--- a/crypto/drbg.c +++ b/crypto/drbg.c @@ -1047,6 +1047,27 @@ static inline int __drbg_seed(struct drb /* 10.1.1.2 / 10.1.1.3 step 5 */ drbg->reseed_ctr = 1;
+ switch (drbg->seeded) { + case DRBG_SEED_STATE_UNSEEDED: + /* Impossible, but handle it to silence compiler warnings. */ + fallthrough; + case DRBG_SEED_STATE_PARTIAL: + /* + * Require frequent reseeds until the seed source is + * fully initialized. + */ + drbg->reseed_threshold = 50; + break; + + case DRBG_SEED_STATE_FULL: + /* + * Seed source has become fully initialized, frequent + * reseeds no longer required. + */ + drbg->reseed_threshold = drbg_max_requests(drbg); + break; + } + return ret; }
@@ -1095,9 +1116,6 @@ static void drbg_async_seed(struct work_
__drbg_seed(drbg, &seedlist, true, DRBG_SEED_STATE_FULL);
- if (drbg->seeded == DRBG_SEED_STATE_FULL) - drbg->reseed_threshold = drbg_max_requests(drbg); - unlock: mutex_unlock(&drbg->drbg_mutex);
@@ -1533,12 +1551,6 @@ static int drbg_prepare_hrng(struct drbg return err; }
- /* - * Require frequent reseeds until the seed source is fully - * initialized. - */ - drbg->reseed_threshold = 50; - return err; }
From: Nicolai Stange nstange@suse.de
commit 074bcd4000e0d812bc253f86fedc40f81ed59ccc upstream.
get_random_bytes() usually hasn't full entropy available by the time DRBG instances are first getting seeded from it during boot. Thus, the DRBG implementation registers random_ready_callbacks which would in turn schedule some work for reseeding the DRBGs once get_random_bytes() has sufficient entropy available.
For reference, the relevant history around handling DRBG (re)seeding in the context of a not yet fully seeded get_random_bytes() is:
commit 16b369a91d0d ("random: Blocking API for accessing nonblocking_pool") commit 4c7879907edd ("crypto: drbg - add async seeding operation")
commit 205a525c3342 ("random: Add callback API for random pool readiness") commit 57225e679788 ("crypto: drbg - Use callback API for random readiness") commit c2719503f5e1 ("random: Remove kernel blocking API")
However, some time later, the initialization state of get_random_bytes() has been made queryable via rng_is_initialized() introduced with commit 9a47249d444d ("random: Make crng state queryable"). This primitive now allows for streamlining the DRBG reseeding from get_random_bytes() by replacing that aforementioned asynchronous work scheduling from random_ready_callbacks with some simpler, synchronous code in drbg_generate() next to the related logic already present therein. Apart from improving overall code readability, this change will also enable DRBG users to rely on wait_for_random_bytes() for ensuring that the initial seeding has completed, if desired.
The previous patches already laid the grounds by making drbg_seed() to record at each DRBG instance whether it was being seeded at a time when rng_is_initialized() still had been false as indicated by ->seeded == DRBG_SEED_STATE_PARTIAL.
All that remains to be done now is to make drbg_generate() check for this condition, determine whether rng_is_initialized() has flipped to true in the meanwhile and invoke a reseed from get_random_bytes() if so.
Make this move: - rename the former drbg_async_seed() work handler, i.e. the one in charge of reseeding a DRBG instance from get_random_bytes(), to "drbg_seed_from_random()", - change its signature as appropriate, i.e. make it take a struct drbg_state rather than a work_struct and change its return type from "void" to "int" in order to allow for passing error information from e.g. its __drbg_seed() invocation onwards to callers, - make drbg_generate() invoke this drbg_seed_from_random() once it encounters a DRBG instance with ->seeded == DRBG_SEED_STATE_PARTIAL by the time rng_is_initialized() has flipped to true and - prune everything related to the former, random_ready_callback based mechanism.
As drbg_seed_from_random() is now getting invoked from drbg_generate() with the ->drbg_mutex being held, it must not attempt to recursively grab it once again. Remove the corresponding mutex operations from what is now drbg_seed_from_random(). Furthermore, as drbg_seed_from_random() can now report errors directly to its caller, there's no need for it to temporarily switch the DRBG's ->seeded state to DRBG_SEED_STATE_UNSEEDED so that a failure of the subsequently invoked __drbg_seed() will get signaled to drbg_generate(). Don't do it then.
Signed-off-by: Nicolai Stange nstange@suse.de Signed-off-by: Herbert Xu herbert@gondor.apana.org.au [Jason: for stable, undid the modifications for the backport of 5acd3548.] Signed-off-by: Jason A. Donenfeld Jason@zx2c4.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- crypto/drbg.c | 61 +++++++++----------------------------------------- drivers/char/random.c | 2 - include/crypto/drbg.h | 2 - 3 files changed, 11 insertions(+), 54 deletions(-)
--- a/crypto/drbg.c +++ b/crypto/drbg.c @@ -1087,12 +1087,10 @@ static inline int drbg_get_random_bytes( return 0; }
-static void drbg_async_seed(struct work_struct *work) +static int drbg_seed_from_random(struct drbg_state *drbg) { struct drbg_string data; LIST_HEAD(seedlist); - struct drbg_state *drbg = container_of(work, struct drbg_state, - seed_work); unsigned int entropylen = drbg_sec_strength(drbg->core->flags); unsigned char entropy[32]; int ret; @@ -1103,23 +1101,15 @@ static void drbg_async_seed(struct work_ drbg_string_fill(&data, entropy, entropylen); list_add_tail(&data.list, &seedlist);
- mutex_lock(&drbg->drbg_mutex); - ret = drbg_get_random_bytes(drbg, entropy, entropylen); if (ret) - goto unlock; - - /* Reset ->seeded so that if __drbg_seed fails the next - * generate call will trigger a reseed. - */ - drbg->seeded = DRBG_SEED_STATE_UNSEEDED; + goto out;
- __drbg_seed(drbg, &seedlist, true, DRBG_SEED_STATE_FULL); - -unlock: - mutex_unlock(&drbg->drbg_mutex); + ret = __drbg_seed(drbg, &seedlist, true, DRBG_SEED_STATE_FULL);
+out: memzero_explicit(entropy, entropylen); + return ret; }
/* @@ -1422,6 +1412,11 @@ static int drbg_generate(struct drbg_sta goto err; /* 9.3.1 step 7.4 */ addtl = NULL; + } else if (rng_is_initialized() && + drbg->seeded == DRBG_SEED_STATE_PARTIAL) { + len = drbg_seed_from_random(drbg); + if (len) + goto err; }
if (addtl && 0 < addtl->len) @@ -1514,44 +1509,15 @@ static int drbg_generate_long(struct drb return 0; }
-static int drbg_schedule_async_seed(struct notifier_block *nb, unsigned long action, void *data) -{ - struct drbg_state *drbg = container_of(nb, struct drbg_state, - random_ready); - - schedule_work(&drbg->seed_work); - return 0; -} - static int drbg_prepare_hrng(struct drbg_state *drbg) { - int err; - /* We do not need an HRNG in test mode. */ if (list_empty(&drbg->test_data.list)) return 0;
drbg->jent = crypto_alloc_rng("jitterentropy_rng", 0, 0);
- INIT_WORK(&drbg->seed_work, drbg_async_seed); - - drbg->random_ready.notifier_call = drbg_schedule_async_seed; - err = register_random_ready_notifier(&drbg->random_ready); - - switch (err) { - case 0: - break; - - case -EALREADY: - err = 0; - fallthrough; - - default: - drbg->random_ready.notifier_call = NULL; - return err; - } - - return err; + return 0; }
/* @@ -1645,11 +1611,6 @@ free_everything: */ static int drbg_uninstantiate(struct drbg_state *drbg) { - if (drbg->random_ready.notifier_call) { - unregister_random_ready_notifier(&drbg->random_ready); - cancel_work_sync(&drbg->seed_work); - } - if (!IS_ERR_OR_NULL(drbg->jent)) crypto_free_rng(drbg->jent); drbg->jent = NULL; --- a/drivers/char/random.c +++ b/drivers/char/random.c @@ -163,7 +163,6 @@ int __cold register_random_ready_notifie spin_unlock_irqrestore(&random_ready_chain_lock, flags); return ret; } -EXPORT_SYMBOL(register_random_ready_notifier);
/* * Delete a previously registered readiness callback function. @@ -178,7 +177,6 @@ int __cold unregister_random_ready_notif spin_unlock_irqrestore(&random_ready_chain_lock, flags); return ret; } -EXPORT_SYMBOL(unregister_random_ready_notifier);
static void __cold process_random_ready_list(void) { --- a/include/crypto/drbg.h +++ b/include/crypto/drbg.h @@ -137,12 +137,10 @@ struct drbg_state { bool pr; /* Prediction resistance enabled? */ bool fips_primed; /* Continuous test primed? */ unsigned char *prev; /* FIPS 140-2 continuous test value */ - struct work_struct seed_work; /* asynchronous seeding support */ struct crypto_rng *jent; const struct drbg_state_ops *d_ops; const struct drbg_core *core; struct drbg_string test_data; - struct notifier_block random_ready; };
static inline __u8 drbg_statelen(struct drbg_state *drbg)
From: Pablo Neira Ayuso pablo@netfilter.org
commit fecf31ee395b0295f2d7260aa29946b7605f7c85 upstream.
Add several sanity checks for nft_set_desc_concat_parse():
- validate desc->field_count not larger than desc->field_len array. - field length cannot be larger than desc->field_len (ie. U8_MAX) - total length of the concatenation cannot be larger than register array.
Joint work with Florian Westphal.
Fixes: f3a2181e16f1 ("netfilter: nf_tables: Support for sets with multiple ranged fields") Reported-by: zhangziming.zzm@antgroup.com Reviewed-by: Stefano Brivio sbrivio@redhat.com Signed-off-by: Florian Westphal fw@strlen.de Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/netfilter/nf_tables_api.c | 17 +++++++++++++---- 1 file changed, 13 insertions(+), 4 deletions(-)
--- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -4151,6 +4151,9 @@ static int nft_set_desc_concat_parse(con u32 len; int err;
+ if (desc->field_count >= ARRAY_SIZE(desc->field_len)) + return -E2BIG; + err = nla_parse_nested_deprecated(tb, NFTA_SET_FIELD_MAX, attr, nft_concat_policy, NULL); if (err < 0) @@ -4160,9 +4163,8 @@ static int nft_set_desc_concat_parse(con return -EINVAL;
len = ntohl(nla_get_be32(tb[NFTA_SET_FIELD_LEN])); - - if (len * BITS_PER_BYTE / 32 > NFT_REG32_COUNT) - return -E2BIG; + if (!len || len > U8_MAX) + return -EINVAL;
desc->field_len[desc->field_count++] = len;
@@ -4173,7 +4175,8 @@ static int nft_set_desc_concat(struct nf const struct nlattr *nla) { struct nlattr *attr; - int rem, err; + u32 num_regs = 0; + int rem, err, i;
nla_for_each_nested(attr, nla, rem) { if (nla_type(attr) != NFTA_LIST_ELEM) @@ -4184,6 +4187,12 @@ static int nft_set_desc_concat(struct nf return err; }
+ for (i = 0; i < desc->field_count; i++) + num_regs += DIV_ROUND_UP(desc->field_len[i], sizeof(u32)); + + if (num_regs > NFT_REG32_COUNT) + return -E2BIG; + return 0; }
From: Pablo Neira Ayuso pablo@netfilter.org
commit 3923b1e4406680d57da7e873da77b1683035d83f upstream.
clean_net() runs in workqueue while walking over the lists, grab mutex.
Fixes: 767d1216bff8 ("netfilter: nftables: fix possible UAF over chains from packet path in netns") Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/netfilter/nf_tables_api.c | 4 ++++ 1 file changed, 4 insertions(+)
--- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -9746,7 +9746,11 @@ static int __net_init nf_tables_init_net
static void __net_exit nf_tables_pre_exit_net(struct net *net) { + struct nftables_pernet *nft_net = nft_pernet(net); + + mutex_lock(&nft_net->commit_mutex); __nft_release_hooks(net); + mutex_unlock(&nft_net->commit_mutex); }
static void __net_exit nf_tables_exit_net(struct net *net)
From: Pablo Neira Ayuso pablo@netfilter.org
commit f9a43007d3f7ba76d5e7f9421094f00f2ef202f8 upstream.
__nft_release_hooks() is called from pre_netns exit path which unregisters the hooks, then the NETDEV_UNREGISTER event is triggered which unregisters the hooks again.
[ 565.221461] WARNING: CPU: 18 PID: 193 at net/netfilter/core.c:495 __nf_unregister_net_hook+0x247/0x270 [...] [ 565.246890] CPU: 18 PID: 193 Comm: kworker/u64:1 Tainted: G E 5.18.0-rc7+ #27 [ 565.253682] Workqueue: netns cleanup_net [ 565.257059] RIP: 0010:__nf_unregister_net_hook+0x247/0x270 [...] [ 565.297120] Call Trace: [ 565.300900] <TASK> [ 565.304683] nf_tables_flowtable_event+0x16a/0x220 [nf_tables] [ 565.308518] raw_notifier_call_chain+0x63/0x80 [ 565.312386] unregister_netdevice_many+0x54f/0xb50
Unregister and destroy netdev hook from netns pre_exit via kfree_rcu so the NETDEV_UNREGISTER path see unregistered hooks.
Fixes: 767d1216bff8 ("netfilter: nftables: fix possible UAF over chains from packet path in netns") Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/netfilter/nf_tables_api.c | 54 +++++++++++++++++++++++++++++++----------- 1 file changed, 41 insertions(+), 13 deletions(-)
--- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -222,12 +222,18 @@ err_register: }
static void nft_netdev_unregister_hooks(struct net *net, - struct list_head *hook_list) + struct list_head *hook_list, + bool release_netdev) { - struct nft_hook *hook; + struct nft_hook *hook, *next;
- list_for_each_entry(hook, hook_list, list) + list_for_each_entry_safe(hook, next, hook_list, list) { nf_unregister_net_hook(net, &hook->ops); + if (release_netdev) { + list_del(&hook->list); + kfree_rcu(hook, rcu); + } + } }
static int nf_tables_register_hook(struct net *net, @@ -253,9 +259,10 @@ static int nf_tables_register_hook(struc return nf_register_net_hook(net, &basechain->ops); }
-static void nf_tables_unregister_hook(struct net *net, - const struct nft_table *table, - struct nft_chain *chain) +static void __nf_tables_unregister_hook(struct net *net, + const struct nft_table *table, + struct nft_chain *chain, + bool release_netdev) { struct nft_base_chain *basechain; const struct nf_hook_ops *ops; @@ -270,11 +277,19 @@ static void nf_tables_unregister_hook(st return basechain->type->ops_unregister(net, ops);
if (nft_base_chain_netdev(table->family, basechain->ops.hooknum)) - nft_netdev_unregister_hooks(net, &basechain->hook_list); + nft_netdev_unregister_hooks(net, &basechain->hook_list, + release_netdev); else nf_unregister_net_hook(net, &basechain->ops); }
+static void nf_tables_unregister_hook(struct net *net, + const struct nft_table *table, + struct nft_chain *chain) +{ + return __nf_tables_unregister_hook(net, table, chain, false); +} + static void nft_trans_commit_list_add_tail(struct net *net, struct nft_trans *trans) { struct nftables_pernet *nft_net = nft_pernet(net); @@ -7206,13 +7221,25 @@ static void nft_unregister_flowtable_hoo FLOW_BLOCK_UNBIND); }
-static void nft_unregister_flowtable_net_hooks(struct net *net, - struct list_head *hook_list) +static void __nft_unregister_flowtable_net_hooks(struct net *net, + struct list_head *hook_list, + bool release_netdev) { - struct nft_hook *hook; + struct nft_hook *hook, *next;
- list_for_each_entry(hook, hook_list, list) + list_for_each_entry_safe(hook, next, hook_list, list) { nf_unregister_net_hook(net, &hook->ops); + if (release_netdev) { + list_del(&hook->list); + kfree_rcu(hook); + } + } +} + +static void nft_unregister_flowtable_net_hooks(struct net *net, + struct list_head *hook_list) +{ + __nft_unregister_flowtable_net_hooks(net, hook_list, false); }
static int nft_register_flowtable_net_hooks(struct net *net, @@ -9605,9 +9632,10 @@ static void __nft_release_hook(struct ne struct nft_chain *chain;
list_for_each_entry(chain, &table->chains, list) - nf_tables_unregister_hook(net, table, chain); + __nf_tables_unregister_hook(net, table, chain, true); list_for_each_entry(flowtable, &table->flowtables, list) - nft_unregister_flowtable_net_hooks(net, &flowtable->hook_list); + __nft_unregister_flowtable_net_hooks(net, &flowtable->hook_list, + true); }
static void __nft_release_hooks(struct net *net)
From: Florian Westphal fw@strlen.de
commit 56b14ecec97f39118bf85c9ac2438c5a949509ed upstream.
In case the conntrack is clashing, insertion can free skb->_nfct and set skb->_nfct to the already-confirmed entry.
This wasn't found before because the conntrack entry and the extension space used to free'd after an rcu grace period, plus the race needs events enabled to trigger.
Reported-by: syzbot+793a590957d9c1b96620@syzkaller.appspotmail.com Fixes: 71d8c47fc653 ("netfilter: conntrack: introduce clash resolution on insertion race") Fixes: 2ad9d7747c10 ("netfilter: conntrack: free extension area immediately") Signed-off-by: Florian Westphal fw@strlen.de Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/net/netfilter/nf_conntrack_core.h | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
--- a/include/net/netfilter/nf_conntrack_core.h +++ b/include/net/netfilter/nf_conntrack_core.h @@ -58,8 +58,13 @@ static inline int nf_conntrack_confirm(s int ret = NF_ACCEPT;
if (ct) { - if (!nf_ct_is_confirmed(ct)) + if (!nf_ct_is_confirmed(ct)) { ret = __nf_conntrack_confirm(skb); + + if (ret == NF_ACCEPT) + ct = (struct nf_conn *)skb_nfct(skb); + } + if (likely(ret == NF_ACCEPT)) nf_ct_deliver_cached_events(ct); }
From: Xiaomeng Tong xiam0nd.tong@gmail.com
commit 300981abddcb13f8f06ad58f52358b53a8096775 upstream.
The bug is here: if (!p) return ret;
The list iterator value 'p' will *always* be set and non-NULL by list_for_each_entry(), so it is incorrect to assume that the iterator value will be NULL if the list is empty or no element is found.
To fix the bug, Use a new value 'iter' as the list iterator, while use the old value 'p' as a dedicated variable to point to the found element.
Fixes: dfaa973ae960 ("KVM: PPC: Book3S HV: In H_SVM_INIT_DONE, migrate remaining normal-GFNs to secure-GFNs") Cc: stable@vger.kernel.org # v5.9+ Signed-off-by: Xiaomeng Tong xiam0nd.tong@gmail.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://lore.kernel.org/r/20220414062103.8153-1-xiam0nd.tong@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/powerpc/kvm/book3s_hv_uvmem.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-)
--- a/arch/powerpc/kvm/book3s_hv_uvmem.c +++ b/arch/powerpc/kvm/book3s_hv_uvmem.c @@ -360,13 +360,15 @@ static bool kvmppc_gfn_is_uvmem_pfn(unsi static bool kvmppc_next_nontransitioned_gfn(const struct kvm_memory_slot *memslot, struct kvm *kvm, unsigned long *gfn) { - struct kvmppc_uvmem_slot *p; + struct kvmppc_uvmem_slot *p = NULL, *iter; bool ret = false; unsigned long i;
- list_for_each_entry(p, &kvm->arch.uvmem_pfns, list) - if (*gfn >= p->base_pfn && *gfn < p->base_pfn + p->nr_pfns) + list_for_each_entry(iter, &kvm->arch.uvmem_pfns, list) + if (*gfn >= iter->base_pfn && *gfn < iter->base_pfn + iter->nr_pfns) { + p = iter; break; + } if (!p) return ret; /*
From: Sean Christopherson seanjc@google.com
commit 0547758a6de3cc71a0cfdd031a3621a30db6a68b upstream.
Drop the raw spinlock in kvm_async_pf_task_wake() before allocating the the dummy async #PF token, the allocator is preemptible on PREEMPT_RT kernels and must not be called from truly atomic contexts.
Opportunistically document why it's ok to loop on allocation failure, i.e. why the function won't get stuck in an infinite loop.
Reported-by: Yajun Deng yajun.deng@linux.dev Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson seanjc@google.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kernel/kvm.c | 41 +++++++++++++++++++++++++++-------------- 1 file changed, 27 insertions(+), 14 deletions(-)
--- a/arch/x86/kernel/kvm.c +++ b/arch/x86/kernel/kvm.c @@ -188,7 +188,7 @@ void kvm_async_pf_task_wake(u32 token) { u32 key = hash_32(token, KVM_TASK_SLEEP_HASHBITS); struct kvm_task_sleep_head *b = &async_pf_sleepers[key]; - struct kvm_task_sleep_node *n; + struct kvm_task_sleep_node *n, *dummy = NULL;
if (token == ~0) { apf_task_wake_all(); @@ -200,28 +200,41 @@ again: n = _find_apf_task(b, token); if (!n) { /* - * async PF was not yet handled. - * Add dummy entry for the token. + * Async #PF not yet handled, add a dummy entry for the token. + * Allocating the token must be down outside of the raw lock + * as the allocator is preemptible on PREEMPT_RT kernels. */ - n = kzalloc(sizeof(*n), GFP_ATOMIC); - if (!n) { + if (!dummy) { + raw_spin_unlock(&b->lock); + dummy = kzalloc(sizeof(*dummy), GFP_KERNEL); + /* - * Allocation failed! Busy wait while other cpu - * handles async PF. + * Continue looping on allocation failure, eventually + * the async #PF will be handled and allocating a new + * node will be unnecessary. + */ + if (!dummy) + cpu_relax(); + + /* + * Recheck for async #PF completion before enqueueing + * the dummy token to avoid duplicate list entries. */ - raw_spin_unlock(&b->lock); - cpu_relax(); goto again; } - n->token = token; - n->cpu = smp_processor_id(); - init_swait_queue_head(&n->wq); - hlist_add_head(&n->link, &b->list); + dummy->token = token; + dummy->cpu = smp_processor_id(); + init_swait_queue_head(&dummy->wq); + hlist_add_head(&dummy->link, &b->list); + dummy = NULL; } else { apf_task_wake_one(n); } raw_spin_unlock(&b->lock); - return; + + /* A dummy token might be allocated and ultimately not used. */ + if (dummy) + kfree(dummy); } EXPORT_SYMBOL_GPL(kvm_async_pf_task_wake);
From: Paolo Bonzini pbonzini@redhat.com
commit baec4f5a018fe2d708fc1022330dba04b38b5fe3 upstream.
Commit ddd7ed842627 ("x86/kvm: Alloc dummy async #PF token outside of raw spinlock") leads to the following Smatch static checker warning:
arch/x86/kernel/kvm.c:212 kvm_async_pf_task_wake() warn: sleeping in atomic context
arch/x86/kernel/kvm.c 202 raw_spin_lock(&b->lock); 203 n = _find_apf_task(b, token); 204 if (!n) { 205 /* 206 * Async #PF not yet handled, add a dummy entry for the token. 207 * Allocating the token must be down outside of the raw lock 208 * as the allocator is preemptible on PREEMPT_RT kernels. 209 */ 210 if (!dummy) { 211 raw_spin_unlock(&b->lock); --> 212 dummy = kzalloc(sizeof(*dummy), GFP_KERNEL); ^^^^^^^^^^ Smatch thinks the caller has preempt disabled. The `smdb.py preempt kvm_async_pf_task_wake` output call tree is:
sysvec_kvm_asyncpf_interrupt() <- disables preempt -> __sysvec_kvm_asyncpf_interrupt() -> kvm_async_pf_task_wake()
The caller is this:
arch/x86/kernel/kvm.c 290 DEFINE_IDTENTRY_SYSVEC(sysvec_kvm_asyncpf_interrupt) 291 { 292 struct pt_regs *old_regs = set_irq_regs(regs); 293 u32 token; 294 295 ack_APIC_irq(); 296 297 inc_irq_stat(irq_hv_callback_count); 298 299 if (__this_cpu_read(apf_reason.enabled)) { 300 token = __this_cpu_read(apf_reason.token); 301 kvm_async_pf_task_wake(token); 302 __this_cpu_write(apf_reason.token, 0); 303 wrmsrl(MSR_KVM_ASYNC_PF_ACK, 1); 304 } 305 306 set_irq_regs(old_regs); 307 }
The DEFINE_IDTENTRY_SYSVEC() is a wrapper that calls this function from the call_on_irqstack_cond(). It's inside the call_on_irqstack_cond() where preempt is disabled (unless it's already disabled). The irq_enter/exit_rcu() functions disable/enable preempt.
Reported-by: Dan Carpenter dan.carpenter@oracle.com Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kernel/kvm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/x86/kernel/kvm.c +++ b/arch/x86/kernel/kvm.c @@ -206,7 +206,7 @@ again: */ if (!dummy) { raw_spin_unlock(&b->lock); - dummy = kzalloc(sizeof(*dummy), GFP_KERNEL); + dummy = kzalloc(sizeof(*dummy), GFP_ATOMIC);
/* * Continue looping on allocation failure, eventually
From: Sean Christopherson seanjc@google.com
commit fee060cd52d69c114b62d1a2948ea9648b5131f9 upstream.
Whenever x86_decode_emulated_instruction() detects a breakpoint, it returns the value that kvm_vcpu_check_breakpoint() writes into its pass-by-reference second argument. Unfortunately this is completely bogus because the expected outcome of x86_decode_emulated_instruction is an EMULATION_* value.
Then, if kvm_vcpu_check_breakpoint() does "*r = 0" (corresponding to a KVM_EXIT_DEBUG userspace exit), it is misunderstood as EMULATION_OK and x86_emulate_instruction() is called without having decoded the instruction. This causes various havoc from running with a stale emulation context.
The fix is to move the call to kvm_vcpu_check_breakpoint() where it was before commit 4aa2691dcbd3 ("KVM: x86: Factor out x86 instruction emulation with decoding") introduced x86_decode_emulated_instruction(). The other caller of the function does not need breakpoint checks, because it is invoked as part of a vmexit and the processor has already checked those before executing the instruction that #GP'd.
This fixes CVE-2022-1852.
Reported-by: Qiuhao Li qiuhao@sysec.org Reported-by: Gaoning Pan pgn@zju.edu.cn Reported-by: Yongkang Jia kangel@zju.edu.cn Fixes: 4aa2691dcbd3 ("KVM: x86: Factor out x86 instruction emulation with decoding") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson seanjc@google.com Message-Id: 20220311032801.3467418-2-seanjc@google.com [Rewrote commit message according to Qiuhao's report, since a patch already existed to fix the bug. - Paolo] Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kvm/x86.c | 31 +++++++++++++++++++------------ 1 file changed, 19 insertions(+), 12 deletions(-)
--- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -7846,7 +7846,7 @@ int kvm_skip_emulated_instruction(struct } EXPORT_SYMBOL_GPL(kvm_skip_emulated_instruction);
-static bool kvm_vcpu_check_breakpoint(struct kvm_vcpu *vcpu, int *r) +static bool kvm_vcpu_check_code_breakpoint(struct kvm_vcpu *vcpu, int *r) { if (unlikely(vcpu->guest_debug & KVM_GUESTDBG_USE_HW_BP) && (vcpu->arch.guest_debug_dr7 & DR7_BP_EN_MASK)) { @@ -7915,25 +7915,23 @@ static bool is_vmware_backdoor_opcode(st }
/* - * Decode to be emulated instruction. Return EMULATION_OK if success. + * Decode an instruction for emulation. The caller is responsible for handling + * code breakpoints. Note, manually detecting code breakpoints is unnecessary + * (and wrong) when emulating on an intercepted fault-like exception[*], as + * code breakpoints have higher priority and thus have already been done by + * hardware. + * + * [*] Except #MC, which is higher priority, but KVM should never emulate in + * response to a machine check. */ int x86_decode_emulated_instruction(struct kvm_vcpu *vcpu, int emulation_type, void *insn, int insn_len) { - int r = EMULATION_OK; struct x86_emulate_ctxt *ctxt = vcpu->arch.emulate_ctxt; + int r;
init_emulate_ctxt(vcpu);
- /* - * We will reenter on the same instruction since we do not set - * complete_userspace_io. This does not handle watchpoints yet, - * those would be handled in the emulate_ops. - */ - if (!(emulation_type & EMULTYPE_SKIP) && - kvm_vcpu_check_breakpoint(vcpu, &r)) - return r; - r = x86_decode_insn(ctxt, insn, insn_len, emulation_type);
trace_kvm_emulate_insn_start(vcpu); @@ -7966,6 +7964,15 @@ int x86_emulate_instruction(struct kvm_v if (!(emulation_type & EMULTYPE_NO_DECODE)) { kvm_clear_exception_queue(vcpu);
+ /* + * Return immediately if RIP hits a code breakpoint, such #DBs + * are fault-like and are higher priority than any faults on + * the code fetch itself. + */ + if (!(emulation_type & EMULTYPE_SKIP) && + kvm_vcpu_check_code_breakpoint(vcpu, &r)) + return r; + r = x86_decode_emulated_instruction(vcpu, emulation_type, insn, insn_len); if (r != EMULATION_OK) {
From: Sean Christopherson seanjc@google.com
commit 45846661d10422ce9e22da21f8277540b29eca22 upstream.
Remove WARNs that sanity check that KVM never lets a triple fault for L2 escape and incorrectly end up in L1. In normal operation, the sanity check is perfectly valid, but it incorrectly assumes that it's impossible for userspace to induce KVM_REQ_TRIPLE_FAULT without bouncing through KVM_RUN (which guarantees kvm_check_nested_state() will see and handle the triple fault).
The WARN can currently be triggered if userspace injects a machine check while L2 is active and CR4.MCE=0. And a future fix to allow save/restore of KVM_REQ_TRIPLE_FAULT, e.g. so that a synthesized triple fault isn't lost on migration, will make it trivially easy for userspace to trigger the WARN.
Clearing KVM_REQ_TRIPLE_FAULT when forcibly leaving guest mode is tempting, but wrong, especially if/when the request is saved/restored, e.g. if userspace restores events (including a triple fault) and then restores nested state (which may forcibly leave guest mode). Ignoring the fact that KVM doesn't currently provide the necessary APIs, it's userspace's responsibility to manage pending events during save/restore.
------------[ cut here ]------------ WARNING: CPU: 7 PID: 1399 at arch/x86/kvm/vmx/nested.c:4522 nested_vmx_vmexit+0x7fe/0xd90 [kvm_intel] Modules linked in: kvm_intel kvm irqbypass CPU: 7 PID: 1399 Comm: state_test Not tainted 5.17.0-rc3+ #808 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 RIP: 0010:nested_vmx_vmexit+0x7fe/0xd90 [kvm_intel] Call Trace: <TASK> vmx_leave_nested+0x30/0x40 [kvm_intel] vmx_set_nested_state+0xca/0x3e0 [kvm_intel] kvm_arch_vcpu_ioctl+0xf49/0x13e0 [kvm] kvm_vcpu_ioctl+0x4b9/0x660 [kvm] __x64_sys_ioctl+0x83/0xb0 do_syscall_64+0x3b/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xae </TASK> ---[ end trace 0000000000000000 ]---
Fixes: cb6a32c2b877 ("KVM: x86: Handle triple fault in L2 without killing L1") Cc: stable@vger.kernel.org Cc: Chenyi Qiang chenyi.qiang@intel.com Signed-off-by: Sean Christopherson seanjc@google.com Message-Id: 20220407002315.78092-2-seanjc@google.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kvm/svm/nested.c | 3 --- arch/x86/kvm/vmx/nested.c | 3 --- 2 files changed, 6 deletions(-)
--- a/arch/x86/kvm/svm/nested.c +++ b/arch/x86/kvm/svm/nested.c @@ -750,9 +750,6 @@ int nested_svm_vmexit(struct vcpu_svm *s struct kvm_host_map map; int rc;
- /* Triple faults in L2 should never escape. */ - WARN_ON_ONCE(kvm_check_request(KVM_REQ_TRIPLE_FAULT, vcpu)); - rc = kvm_vcpu_map(vcpu, gpa_to_gfn(svm->nested.vmcb12_gpa), &map); if (rc) { if (rc == -EINVAL) --- a/arch/x86/kvm/vmx/nested.c +++ b/arch/x86/kvm/vmx/nested.c @@ -4501,9 +4501,6 @@ void nested_vmx_vmexit(struct kvm_vcpu * /* trying to cancel vmlaunch/vmresume is a bug */ WARN_ON_ONCE(vmx->nested.nested_run_pending);
- /* Similarly, triple faults in L2 should never escape. */ - WARN_ON_ONCE(kvm_check_request(KVM_REQ_TRIPLE_FAULT, vcpu)); - if (kvm_check_request(KVM_REQ_GET_NESTED_STATE_PAGES, vcpu)) { /* * KVM_REQ_GET_NESTED_STATE_PAGES is also used to map
From: Ashish Kalra ashish.kalra@amd.com
commit d22d2474e3953996f03528b84b7f52cc26a39403 upstream.
For some sev ioctl interfaces, the length parameter that is passed maybe less than or equal to SEV_FW_BLOB_MAX_SIZE, but larger than the data that PSP firmware returns. In this case, kmalloc will allocate memory that is the size of the input rather than the size of the data. Since PSP firmware doesn't fully overwrite the allocated buffer, these sev ioctl interface may return uninitialized kernel slab memory.
Reported-by: Andy Nguyen theflow@google.com Suggested-by: David Rientjes rientjes@google.com Suggested-by: Peter Gonda pgonda@google.com Cc: kvm@vger.kernel.org Cc: stable@vger.kernel.org Cc: linux-kernel@vger.kernel.org Fixes: eaf78265a4ab3 ("KVM: SVM: Move SEV code to separate file") Fixes: 2c07ded06427d ("KVM: SVM: add support for SEV attestation command") Fixes: 4cfdd47d6d95a ("KVM: SVM: Add KVM_SEV SEND_START command") Fixes: d3d1af85e2c75 ("KVM: SVM: Add KVM_SEND_UPDATE_DATA command") Fixes: eba04b20e4861 ("KVM: x86: Account a variety of miscellaneous allocations") Signed-off-by: Ashish Kalra ashish.kalra@amd.com Reviewed-by: Peter Gonda pgonda@google.com Message-Id: 20220516154310.3685678-1-Ashish.Kalra@amd.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kvm/svm/sev.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-)
--- a/arch/x86/kvm/svm/sev.c +++ b/arch/x86/kvm/svm/sev.c @@ -676,7 +676,7 @@ static int sev_launch_measure(struct kvm if (params.len > SEV_FW_BLOB_MAX_SIZE) return -EINVAL;
- blob = kmalloc(params.len, GFP_KERNEL_ACCOUNT); + blob = kzalloc(params.len, GFP_KERNEL_ACCOUNT); if (!blob) return -ENOMEM;
@@ -796,7 +796,7 @@ static int __sev_dbg_decrypt_user(struct if (!IS_ALIGNED(dst_paddr, 16) || !IS_ALIGNED(paddr, 16) || !IS_ALIGNED(size, 16)) { - tpage = (void *)alloc_page(GFP_KERNEL); + tpage = (void *)alloc_page(GFP_KERNEL | __GFP_ZERO); if (!tpage) return -ENOMEM;
@@ -1082,7 +1082,7 @@ static int sev_get_attestation_report(st if (params.len > SEV_FW_BLOB_MAX_SIZE) return -EINVAL;
- blob = kmalloc(params.len, GFP_KERNEL_ACCOUNT); + blob = kzalloc(params.len, GFP_KERNEL_ACCOUNT); if (!blob) return -ENOMEM;
@@ -1164,7 +1164,7 @@ static int sev_send_start(struct kvm *kv return -EINVAL;
/* allocate the memory to hold the session data blob */ - session_data = kmalloc(params.session_len, GFP_KERNEL_ACCOUNT); + session_data = kzalloc(params.session_len, GFP_KERNEL_ACCOUNT); if (!session_data) return -ENOMEM;
@@ -1288,11 +1288,11 @@ static int sev_send_update_data(struct k
/* allocate memory for header and transport buffer */ ret = -ENOMEM; - hdr = kmalloc(params.hdr_len, GFP_KERNEL_ACCOUNT); + hdr = kzalloc(params.hdr_len, GFP_KERNEL_ACCOUNT); if (!hdr) goto e_unpin;
- trans_data = kmalloc(params.trans_len, GFP_KERNEL_ACCOUNT); + trans_data = kzalloc(params.trans_len, GFP_KERNEL_ACCOUNT); if (!trans_data) goto e_free_hdr;
From: Fabio Estevam festevam@denx.de
commit 4ee4cdad368a26de3967f2975806a9ee2fa245df upstream.
Since commit 358ba762d9f1 ("crypto: caam - enable prediction resistance in HRWNG") the following CAAM errors can be seen on i.MX6SX:
caam_jr 2101000.jr: 20003c5b: CCB: desc idx 60: RNG: Hardware error hwrng: no data available
This error is due to an incorrect entropy delay for i.MX6SX.
Fix it by increasing the minimum entropy delay for i.MX6SX as done in U-Boot: https://patchwork.ozlabs.org/project/uboot/patch/20220415111049.2565744-1-ga...
As explained in the U-Boot patch:
"RNG self tests are run to determine the correct entropy delay. Such tests are executed with different voltages and temperatures to identify the worst case value for the entropy delay. For i.MX6SX, it was determined that after adding a margin value of 1000 the minimum entropy delay should be at least 12000."
Cc: stable@vger.kernel.org Fixes: 358ba762d9f1 ("crypto: caam - enable prediction resistance in HRWNG") Signed-off-by: Fabio Estevam festevam@denx.de Reviewed-by: Horia Geantă horia.geanta@nxp.com Reviewed-by: Vabhav Sharma vabhav.sharma@nxp.com Reviewed-by: Gaurav Jain gaurav.jain@nxp.com Signed-off-by: Herbert Xu herbert@gondor.apana.org.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/crypto/caam/ctrl.c | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+)
--- a/drivers/crypto/caam/ctrl.c +++ b/drivers/crypto/caam/ctrl.c @@ -609,6 +609,13 @@ static bool check_version(struct fsl_mc_ } #endif
+static bool needs_entropy_delay_adjustment(void) +{ + if (of_machine_is_compatible("fsl,imx6sx")) + return true; + return false; +} + /* Probe routine for CAAM top (controller) level */ static int caam_probe(struct platform_device *pdev) { @@ -855,6 +862,8 @@ static int caam_probe(struct platform_de * Also, if a handle was instantiated, do not change * the TRNG parameters. */ + if (needs_entropy_delay_adjustment()) + ent_delay = 12000; if (!(ctrlpriv->rng4_sh_init || inst_handles)) { dev_info(dev, "Entropy delay = %u\n", @@ -871,6 +880,15 @@ static int caam_probe(struct platform_de */ ret = instantiate_rng(dev, inst_handles, gen_sk); + /* + * Entropy delay is determined via TRNG characterization. + * TRNG characterization is run across different voltages + * and temperatures. + * If worst case value for ent_dly is identified, + * the loop can be skipped for that platform. + */ + if (needs_entropy_delay_adjustment()) + break; if (ret == -EAGAIN) /* * if here, the loop will rerun,
From: Vitaly Chikunov vt@altlinux.org
commit 7cc7ab73f83ee6d50dc9536bc3355495d8600fad upstream.
Correctly compare values that shall be greater-or-equal and not just greater.
Fixes: 0d7a78643f69 ("crypto: ecrdsa - add EC-RDSA (GOST 34.10) algorithm") Cc: stable@vger.kernel.org Signed-off-by: Vitaly Chikunov vt@altlinux.org Signed-off-by: Herbert Xu herbert@gondor.apana.org.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- crypto/ecrdsa.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
--- a/crypto/ecrdsa.c +++ b/crypto/ecrdsa.c @@ -113,15 +113,15 @@ static int ecrdsa_verify(struct akcipher
/* Step 1: verify that 0 < r < q, 0 < s < q */ if (vli_is_zero(r, ndigits) || - vli_cmp(r, ctx->curve->n, ndigits) == 1 || + vli_cmp(r, ctx->curve->n, ndigits) >= 0 || vli_is_zero(s, ndigits) || - vli_cmp(s, ctx->curve->n, ndigits) == 1) + vli_cmp(s, ctx->curve->n, ndigits) >= 0) return -EKEYREJECTED;
/* Step 2: calculate hash (h) of the message (passed as input) */ /* Step 3: calculate e = h \mod q */ vli_from_le64(e, digest, ndigits); - if (vli_cmp(e, ctx->curve->n, ndigits) == 1) + if (vli_cmp(e, ctx->curve->n, ndigits) >= 0) vli_sub(e, e, ctx->curve->n, ndigits); if (vli_is_zero(e, ndigits)) e[0] = 1; @@ -137,7 +137,7 @@ static int ecrdsa_verify(struct akcipher /* Step 6: calculate point C = z_1P + z_2Q, and R = x_c \mod q */ ecc_point_mult_shamir(&cc, z1, &ctx->curve->g, z2, &ctx->pub_key, ctx->curve); - if (vli_cmp(cc.x, ctx->curve->n, ndigits) == 1) + if (vli_cmp(cc.x, ctx->curve->n, ndigits) >= 0) vli_sub(cc.x, cc.x, ctx->curve->n, ndigits);
/* Step 7: if R == r signature is valid */
From: Sultan Alsawaf sultan@kerneltoast.com
commit 2505a981114dcb715f8977b8433f7540854851d8 upstream.
The asynchronous zspage free worker tries to lock a zspage's entire page list without defending against page migration. Since pages which haven't yet been locked can concurrently migrate off the zspage page list while lock_zspage() churns away, lock_zspage() can suffer from a few different lethal races.
It can lock a page which no longer belongs to the zspage and unsafely dereference page_private(), it can unsafely dereference a torn pointer to the next page (since there's a data race), and it can observe a spurious NULL pointer to the next page and thus not lock all of the zspage's pages (since a single page migration will reconstruct the entire page list, and create_page_chain() unconditionally zeroes out each list pointer in the process).
Fix the races by using migrate_read_lock() in lock_zspage() to synchronize with page migration.
Link: https://lkml.kernel.org/r/20220509024703.243847-1-sultan@kerneltoast.com Fixes: 77ff465799c602 ("zsmalloc: zs_page_migrate: skip unnecessary loops but not return -EBUSY if zspage is not inuse") Signed-off-by: Sultan Alsawaf sultan@kerneltoast.com Acked-by: Minchan Kim minchan@kernel.org Cc: Nitin Gupta ngupta@vflare.org Cc: Sergey Senozhatsky senozhatsky@chromium.org Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/zsmalloc.c | 37 +++++++++++++++++++++++++++++++++---- 1 file changed, 33 insertions(+), 4 deletions(-)
--- a/mm/zsmalloc.c +++ b/mm/zsmalloc.c @@ -1743,11 +1743,40 @@ static enum fullness_group putback_zspag */ static void lock_zspage(struct zspage *zspage) { - struct page *page = get_first_page(zspage); + struct page *curr_page, *page;
- do { - lock_page(page); - } while ((page = get_next_page(page)) != NULL); + /* + * Pages we haven't locked yet can be migrated off the list while we're + * trying to lock them, so we need to be careful and only attempt to + * lock each page under migrate_read_lock(). Otherwise, the page we lock + * may no longer belong to the zspage. This means that we may wait for + * the wrong page to unlock, so we must take a reference to the page + * prior to waiting for it to unlock outside migrate_read_lock(). + */ + while (1) { + migrate_read_lock(zspage); + page = get_first_page(zspage); + if (trylock_page(page)) + break; + get_page(page); + migrate_read_unlock(zspage); + wait_on_page_locked(page); + put_page(page); + } + + curr_page = page; + while ((page = get_next_page(curr_page))) { + if (trylock_page(page)) { + curr_page = page; + } else { + get_page(page); + migrate_read_unlock(zspage); + wait_on_page_locked(page); + put_page(page); + migrate_read_lock(zspage); + } + } + migrate_read_unlock(zspage); }
static int zs_init_fs_context(struct fs_context *fc)
From: Takashi Iwai tiwai@suse.de
commit 5ce0b06ae5e69e23142e73c5c3c0260e9f2ccb4b upstream.
Maris reported that TEAC UD-501 (0644:8043) doesn't work with the typical "clock source 41 is not valid, cannot use" errors on the recent kernels. The currently known workaround so far is to restore (partially) what we've done unconditionally at the clock setup; namely, re-setup the USB interface immediately after the clock is changed. This patch re-introduces the behavior conditionally for TEAC devices.
Further notes: - The USB interface shall be set later in snd_usb_endpoint_configure(), but this seems to be too late. - Even calling usb_set_interface() right after sne_usb_init_sample_rate() doesn't help; so this must be related with the clock validation, too. - The device may still spew the "clock source 41 is not valid" error at the first clock setup. This seems happening at the very first try of clock setup, but it disappears at later attempts. The error is likely harmless because the driver retries the clock setup (such an error is more or less expected on some devices).
Fixes: bf6313a0ff76 ("ALSA: usb-audio: Refactor endpoint management") Reported-and-tested-by: Maris Abele maris7abele@gmail.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20220521064627.29292-1-tiwai@suse.de Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/usb/clock.c | 7 +++++++ 1 file changed, 7 insertions(+)
--- a/sound/usb/clock.c +++ b/sound/usb/clock.c @@ -572,6 +572,13 @@ static int set_sample_rate_v2v3(struct s /* continue processing */ }
+ /* FIXME - TEAC devices require the immediate interface setup */ + if (rate != prev_rate && USB_ID_VENDOR(chip->usb_id) == 0x0644) { + usb_set_interface(chip->dev, fmt->iface, fmt->altsetting); + if (chip->quirk_flags & QUIRK_FLAG_IFACE_DELAY) + msleep(50); + } + validation: /* validate clock after rate change */ if (!uac_clock_source_is_valid(chip, fmt, clock))
From: Takashi Iwai tiwai@suse.de
commit 7b0efea4baf02f5e2f89e5f9b75ef891571b45f1 upstream.
The quirk entry for Focusrite Saffire 6 had no proper ep_idx for the capture endpoint, and this confused the driver, resulting in the broken sound. This patch adds the missing ep_idx in the entry.
While we are at it, a couple of other entries (for Digidesign MBox and MOTU MicroBook II) seem to have the same problem, and those are covered as well.
Fixes: bf6313a0ff76 ("ALSA: usb-audio: Refactor endpoint management") Reported-by: André Kapelrud a.kapelrud@gmail.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20220521065325.426-1-tiwai@suse.de Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/usb/quirks-table.h | 3 +++ 1 file changed, 3 insertions(+)
--- a/sound/usb/quirks-table.h +++ b/sound/usb/quirks-table.h @@ -2672,6 +2672,7 @@ YAMAHA_DEVICE(0x7010, "UB99"), .altset_idx = 1, .attributes = 0, .endpoint = 0x82, + .ep_idx = 1, .ep_attr = USB_ENDPOINT_XFER_ISOC, .datainterval = 1, .maxpacksize = 0x0126, @@ -2875,6 +2876,7 @@ YAMAHA_DEVICE(0x7010, "UB99"), .altset_idx = 1, .attributes = 0x4, .endpoint = 0x81, + .ep_idx = 1, .ep_attr = USB_ENDPOINT_XFER_ISOC | USB_ENDPOINT_SYNC_ASYNC, .maxpacksize = 0x130, @@ -3391,6 +3393,7 @@ YAMAHA_DEVICE(0x7010, "UB99"), .altset_idx = 1, .attributes = 0, .endpoint = 0x03, + .ep_idx = 1, .rates = SNDRV_PCM_RATE_96000, .ep_attr = USB_ENDPOINT_XFER_ISOC | USB_ENDPOINT_SYNC_ASYNC,
From: Craig McLure craig@mclure.net
commit 0e85a22d01dfe9ad9a9d9e87cd4a88acce1aad65 upstream.
Devices such as the TC-Helicon GoXLR require the sync endpoint to be configured in advance of the data endpoint in order for sound output to work.
This patch simply changes the ordering of EP configuration to resolve this.
Fixes: bf6313a0ff76 ("ALSA: usb-audio: Refactor endpoint management") BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215079 Signed-off-by: Craig McLure craig@mclure.net Reviewed-by: Jaroslav Kysela perex@perex.cz Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20220524062115.25968-1-tiwai@suse.de Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/usb/pcm.c | 17 +++++++++++------ 1 file changed, 11 insertions(+), 6 deletions(-)
diff --git a/sound/usb/pcm.c b/sound/usb/pcm.c index 6d699065e81a..b470404a5376 100644 --- a/sound/usb/pcm.c +++ b/sound/usb/pcm.c @@ -439,16 +439,21 @@ static int configure_endpoints(struct snd_usb_audio *chip, /* stop any running stream beforehand */ if (stop_endpoints(subs, false)) sync_pending_stops(subs); + if (subs->sync_endpoint) { + err = snd_usb_endpoint_configure(chip, subs->sync_endpoint); + if (err < 0) + return err; + } err = snd_usb_endpoint_configure(chip, subs->data_endpoint); if (err < 0) return err; snd_usb_set_format_quirk(subs, subs->cur_audiofmt); - } - - if (subs->sync_endpoint) { - err = snd_usb_endpoint_configure(chip, subs->sync_endpoint); - if (err < 0) - return err; + } else { + if (subs->sync_endpoint) { + err = snd_usb_endpoint_configure(chip, subs->sync_endpoint); + if (err < 0) + return err; + } }
return 0;
From: Steven Rostedt rostedt@goodmis.org
commit 72ef98445aca568a81c2da050532500a8345ad3a upstream.
While looking at a crash report on a timer list being corrupted, which usually happens when a timer is freed while still active. This is commonly triggered by code calling del_timer() instead of del_timer_sync() just before freeing.
One possible culprit is the hci_qca driver, which does exactly that.
Eric mentioned that wake_retrans_timer could be rearmed via the work queue, so also move the destruction of the work queue before del_timer_sync().
Cc: Eric Dumazet eric.dumazet@gmail.com Cc: stable@vger.kernel.org Fixes: 0ff252c1976da ("Bluetooth: hciuart: Add support QCA chipset for UART") Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Marcel Holtmann marcel@holtmann.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/bluetooth/hci_qca.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/bluetooth/hci_qca.c +++ b/drivers/bluetooth/hci_qca.c @@ -696,9 +696,9 @@ static int qca_close(struct hci_uart *hu skb_queue_purge(&qca->tx_wait_q); skb_queue_purge(&qca->txq); skb_queue_purge(&qca->rx_memdump_q); - del_timer(&qca->tx_idle_timer); - del_timer(&qca->wake_retrans_timer); destroy_workqueue(qca->workqueue); + del_timer_sync(&qca->tx_idle_timer); + del_timer_sync(&qca->wake_retrans_timer); qca->hu = NULL;
kfree_skb(qca->rx_skb);
From: Jonathan Bakker xc-racer2@live.ca
commit 3f5e3d3a8b895c8a11da8b0063ba2022dd9e2045 upstream.
Correct the name of the bluetooth interrupt from host-wake to host-wakeup.
Fixes: 1c65b6184441b ("ARM: dts: s5pv210: Correct BCM4329 bluetooth node") Cc: stable@vger.kernel.org Signed-off-by: Jonathan Bakker xc-racer2@live.ca Link: https://lore.kernel.org/r/CY4PR04MB0567495CFCBDC8D408D44199CB1C9@CY4PR04MB05... Signed-off-by: Krzysztof Kozlowski krzysztof.kozlowski@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm/boot/dts/s5pv210-aries.dtsi | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/arm/boot/dts/s5pv210-aries.dtsi +++ b/arch/arm/boot/dts/s5pv210-aries.dtsi @@ -895,7 +895,7 @@ device-wakeup-gpios = <&gpg3 4 GPIO_ACTIVE_HIGH>; interrupt-parent = <&gph2>; interrupts = <5 IRQ_TYPE_LEVEL_HIGH>; - interrupt-names = "host-wake"; + interrupt-names = "host-wakeup"; }; };
From: Dan Carpenter dan.carpenter@oracle.com
commit d3f2a14b8906df913cb04a706367b012db94a6e8 upstream.
The "r" variable shadows an earlier "r" that has function scope. It means that we accidentally return success instead of an error code. Smatch has a warning for this:
drivers/md/dm-integrity.c:4503 dm_integrity_ctr() warn: missing error code 'r'
Fixes: 7eada909bfd7 ("dm: add integrity target") Cc: stable@vger.kernel.org Signed-off-by: Dan Carpenter dan.carpenter@oracle.com Reviewed-by: Mikulas Patocka mpatocka@redhat.com Signed-off-by: Mike Snitzer snitzer@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/md/dm-integrity.c | 2 -- 1 file changed, 2 deletions(-)
--- a/drivers/md/dm-integrity.c +++ b/drivers/md/dm-integrity.c @@ -4478,8 +4478,6 @@ try_smaller_buffer: }
if (should_write_sb) { - int r; - init_journal(ic, 0, ic->journal_sections, 0); r = dm_integrity_failed(ic); if (unlikely(r)) {
From: Mikulas Patocka mpatocka@redhat.com
commit 567dd8f34560fa221a6343729474536aa7ede4fd upstream.
The device mapper dm-crypt target is using scnprintf("%02x", cc->key[i]) to report the current key to userspace. However, this is not a constant-time operation and it may leak information about the key via timing, via cache access patterns or via the branch predictor.
Change dm-crypt's key printing to use "%c" instead of "%02x". Also introduce hex2asc() that carefully avoids any branching or memory accesses when converting a number in the range 0 ... 15 to an ascii character.
Cc: stable@vger.kernel.org Signed-off-by: Mikulas Patocka mpatocka@redhat.com Tested-by: Milan Broz gmazyland@gmail.com Signed-off-by: Mike Snitzer snitzer@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/md/dm-crypt.c | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-)
--- a/drivers/md/dm-crypt.c +++ b/drivers/md/dm-crypt.c @@ -3435,6 +3435,11 @@ static int crypt_map(struct dm_target *t return DM_MAPIO_SUBMITTED; }
+static char hex2asc(unsigned char c) +{ + return c + '0' + ((unsigned)(9 - c) >> 4 & 0x27); +} + static void crypt_status(struct dm_target *ti, status_type_t type, unsigned status_flags, char *result, unsigned maxlen) { @@ -3453,9 +3458,12 @@ static void crypt_status(struct dm_targe if (cc->key_size > 0) { if (cc->key_string) DMEMIT(":%u:%s", cc->key_size, cc->key_string); - else - for (i = 0; i < cc->key_size; i++) - DMEMIT("%02x", cc->key[i]); + else { + for (i = 0; i < cc->key_size; i++) { + DMEMIT("%c%c", hex2asc(cc->key[i] >> 4), + hex2asc(cc->key[i] & 0xf)); + } + } } else DMEMIT("-");
From: Mikulas Patocka mpatocka@redhat.com
commit bfe2b0146c4d0230b68f5c71a64380ff8d361f8b upstream.
dm-stats can be used with a very large number of entries (it is only limited by 1/4 of total system memory), so add rescheduling points to the loops that iterate over the entries.
Cc: stable@vger.kernel.org Signed-off-by: Mikulas Patocka mpatocka@redhat.com Signed-off-by: Mike Snitzer snitzer@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/md/dm-stats.c | 8 ++++++++ 1 file changed, 8 insertions(+)
--- a/drivers/md/dm-stats.c +++ b/drivers/md/dm-stats.c @@ -225,6 +225,7 @@ void dm_stats_cleanup(struct dm_stats *s atomic_read(&shared->in_flight[READ]), atomic_read(&shared->in_flight[WRITE])); } + cond_resched(); } dm_stat_free(&s->rcu_head); } @@ -330,6 +331,7 @@ static int dm_stats_create(struct dm_sta for (ni = 0; ni < n_entries; ni++) { atomic_set(&s->stat_shared[ni].in_flight[READ], 0); atomic_set(&s->stat_shared[ni].in_flight[WRITE], 0); + cond_resched(); }
if (s->n_histogram_entries) { @@ -342,6 +344,7 @@ static int dm_stats_create(struct dm_sta for (ni = 0; ni < n_entries; ni++) { s->stat_shared[ni].tmp.histogram = hi; hi += s->n_histogram_entries + 1; + cond_resched(); } }
@@ -362,6 +365,7 @@ static int dm_stats_create(struct dm_sta for (ni = 0; ni < n_entries; ni++) { p[ni].histogram = hi; hi += s->n_histogram_entries + 1; + cond_resched(); } } } @@ -497,6 +501,7 @@ static int dm_stats_list(struct dm_stats } DMEMIT("\n"); } + cond_resched(); } mutex_unlock(&stats->mutex);
@@ -774,6 +779,7 @@ static void __dm_stat_clear(struct dm_st local_irq_enable(); } } + cond_resched(); } }
@@ -889,6 +895,8 @@ static int dm_stats_print(struct dm_stat
if (unlikely(sz + 1 >= maxlen)) goto buffer_overflow; + + cond_resched(); }
if (clear)
From: Sarthak Kukreti sarthakkukreti@google.com
commit 4caae58406f8ceb741603eee460d79bacca9b1b5 upstream.
The device-mapper framework provides a mechanism to mark targets as immutable (and hence fail table reloads that try to change the target type). Add the DM_TARGET_IMMUTABLE flag to the dm-verity target's feature flags to prevent switching the verity target with a different target type.
Fixes: a4ffc152198e ("dm: add verity target") Cc: stable@vger.kernel.org Signed-off-by: Sarthak Kukreti sarthakkukreti@google.com Reviewed-by: Kees Cook keescook@chromium.org Signed-off-by: Mike Snitzer snitzer@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/md/dm-verity-target.c | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/md/dm-verity-target.c +++ b/drivers/md/dm-verity-target.c @@ -1312,6 +1312,7 @@ bad:
static struct target_type verity_target = { .name = "verity", + .features = DM_TARGET_IMMUTABLE, .version = {1, 8, 0}, .module = THIS_MODULE, .ctr = verity_ctr,
From: Mariusz Tkaczyk mariusz.tkaczyk@linux.intel.com
commit 57668f0a4cc4083a120cc8c517ca0055c4543b59 upstream.
Raid456 module had allowed to achieve failed state. It was fixed by fb73b357fb9 ("raid5: block failing device if raid will be failed"). This fix introduces a bug, now if raid5 fails during IO, it may result with a hung task without completion. Faulty flag on the device is necessary to process all requests and is checked many times, mainly in analyze_stripe(). Allow to set faulty on drive again and set MD_BROKEN if raid is failed.
As a result, this level is allowed to achieve failed state again, but communication with userspace (via -EBUSY status) will be preserved.
This restores possibility to fail array via #mdadm --set-faulty command and will be fixed by additional verification on mdadm side.
Reproduction steps: mdadm -CR imsm -e imsm -n 3 /dev/nvme[0-2]n1 mdadm -CR r5 -e imsm -l5 -n3 /dev/nvme[0-2]n1 --assume-clean mkfs.xfs /dev/md126 -f mount /dev/md126 /mnt/root/
fio --filename=/mnt/root/file --size=5GB --direct=1 --rw=randrw --bs=64k --ioengine=libaio --iodepth=64 --runtime=240 --numjobs=4 --time_based --group_reporting --name=throughput-test-job --eta-newline=1 &
echo 1 > /sys/block/nvme2n1/device/device/remove echo 1 > /sys/block/nvme1n1/device/device/remove
[ 1475.787779] Call Trace: [ 1475.793111] __schedule+0x2a6/0x700 [ 1475.799460] schedule+0x38/0xa0 [ 1475.805454] raid5_get_active_stripe+0x469/0x5f0 [raid456] [ 1475.813856] ? finish_wait+0x80/0x80 [ 1475.820332] raid5_make_request+0x180/0xb40 [raid456] [ 1475.828281] ? finish_wait+0x80/0x80 [ 1475.834727] ? finish_wait+0x80/0x80 [ 1475.841127] ? finish_wait+0x80/0x80 [ 1475.847480] md_handle_request+0x119/0x190 [ 1475.854390] md_make_request+0x8a/0x190 [ 1475.861041] generic_make_request+0xcf/0x310 [ 1475.868145] submit_bio+0x3c/0x160 [ 1475.874355] iomap_dio_submit_bio.isra.20+0x51/0x60 [ 1475.882070] iomap_dio_bio_actor+0x175/0x390 [ 1475.889149] iomap_apply+0xff/0x310 [ 1475.895447] ? iomap_dio_bio_actor+0x390/0x390 [ 1475.902736] ? iomap_dio_bio_actor+0x390/0x390 [ 1475.909974] iomap_dio_rw+0x2f2/0x490 [ 1475.916415] ? iomap_dio_bio_actor+0x390/0x390 [ 1475.923680] ? atime_needs_update+0x77/0xe0 [ 1475.930674] ? xfs_file_dio_aio_read+0x6b/0xe0 [xfs] [ 1475.938455] xfs_file_dio_aio_read+0x6b/0xe0 [xfs] [ 1475.946084] xfs_file_read_iter+0xba/0xd0 [xfs] [ 1475.953403] aio_read+0xd5/0x180 [ 1475.959395] ? _cond_resched+0x15/0x30 [ 1475.965907] io_submit_one+0x20b/0x3c0 [ 1475.972398] __x64_sys_io_submit+0xa2/0x180 [ 1475.979335] ? do_io_getevents+0x7c/0xc0 [ 1475.986009] do_syscall_64+0x5b/0x1a0 [ 1475.992419] entry_SYSCALL_64_after_hwframe+0x65/0xca [ 1476.000255] RIP: 0033:0x7f11fc27978d [ 1476.006631] Code: Bad RIP value. [ 1476.073251] INFO: task fio:3877 blocked for more than 120 seconds.
Cc: stable@vger.kernel.org Fixes: fb73b357fb9 ("raid5: block failing device if raid will be failed") Reviewd-by: Xiao Ni xni@redhat.com Signed-off-by: Mariusz Tkaczyk mariusz.tkaczyk@linux.intel.com Signed-off-by: Song Liu song@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/md/raid5.c | 47 ++++++++++++++++++++++------------------------- 1 file changed, 22 insertions(+), 25 deletions(-)
--- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -686,17 +686,17 @@ int raid5_calc_degraded(struct r5conf *c return degraded; }
-static int has_failed(struct r5conf *conf) +static bool has_failed(struct r5conf *conf) { - int degraded; + int degraded = conf->mddev->degraded;
- if (conf->mddev->reshape_position == MaxSector) - return conf->mddev->degraded > conf->max_degraded; + if (test_bit(MD_BROKEN, &conf->mddev->flags)) + return true;
- degraded = raid5_calc_degraded(conf); - if (degraded > conf->max_degraded) - return 1; - return 0; + if (conf->mddev->reshape_position != MaxSector) + degraded = raid5_calc_degraded(conf); + + return degraded > conf->max_degraded; }
struct stripe_head * @@ -2877,34 +2877,31 @@ static void raid5_error(struct mddev *md unsigned long flags; pr_debug("raid456: error called\n");
+ pr_crit("md/raid:%s: Disk failure on %s, disabling device.\n", + mdname(mddev), bdevname(rdev->bdev, b)); + spin_lock_irqsave(&conf->device_lock, flags); + set_bit(Faulty, &rdev->flags); + clear_bit(In_sync, &rdev->flags); + mddev->degraded = raid5_calc_degraded(conf);
- if (test_bit(In_sync, &rdev->flags) && - mddev->degraded == conf->max_degraded) { - /* - * Don't allow to achieve failed state - * Don't try to recover this device - */ + if (has_failed(conf)) { + set_bit(MD_BROKEN, &conf->mddev->flags); conf->recovery_disabled = mddev->recovery_disabled; - spin_unlock_irqrestore(&conf->device_lock, flags); - return; + + pr_crit("md/raid:%s: Cannot continue operation (%d/%d failed).\n", + mdname(mddev), mddev->degraded, conf->raid_disks); + } else { + pr_crit("md/raid:%s: Operation continuing on %d devices.\n", + mdname(mddev), conf->raid_disks - mddev->degraded); }
- set_bit(Faulty, &rdev->flags); - clear_bit(In_sync, &rdev->flags); - mddev->degraded = raid5_calc_degraded(conf); spin_unlock_irqrestore(&conf->device_lock, flags); set_bit(MD_RECOVERY_INTR, &mddev->recovery);
set_bit(Blocked, &rdev->flags); set_mask_bits(&mddev->sb_flags, 0, BIT(MD_SB_CHANGE_DEVS) | BIT(MD_SB_CHANGE_PENDING)); - pr_crit("md/raid:%s: Disk failure on %s, disabling device.\n" - "md/raid:%s: Operation continuing on %d devices.\n", - mdname(mddev), - bdevname(rdev->bdev, b), - mdname(mddev), - conf->raid_disks - mddev->degraded); r5c_update_on_rdev_error(mddev, rdev); }
From: Randy Dunlap rdunlap@infradead.org
commit a3b774342fa752a5290c0de36375289dfcf4a260 upstream.
When the NTFS BOOT sectors_per_clusters field is > 0x80, it represents a shift value. Make sure that the shift value is not too large before using it (NTFS max cluster size is 2MB). Return -EVINVAL if it too large.
This prevents negative shift values and shift values that are larger than the field size.
Prevents this UBSAN error:
UBSAN: shift-out-of-bounds in ../fs/ntfs3/super.c:673:16 shift exponent -192 is negative
Link: https://lkml.kernel.org/r/20220502175342.20296-1-rdunlap@infradead.org Fixes: 82cae269cfa9 ("fs/ntfs3: Add initialization of super block") Signed-off-by: Randy Dunlap rdunlap@infradead.org Reported-by: syzbot+1631f09646bc214d2e76@syzkaller.appspotmail.com Reviewed-by: Namjae Jeon linkinjeon@kernel.org Cc: Konstantin Komarov almaz.alexandrovich@paragon-software.com Cc: Alexander Viro viro@zeniv.linux.org.uk Cc: Kari Argillander kari.argillander@stargateuniverse.net Cc: Namjae Jeon linkinjeon@kernel.org Cc: Matthew Wilcox willy@infradead.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ntfs3/super.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-)
--- a/fs/ntfs3/super.c +++ b/fs/ntfs3/super.c @@ -668,9 +668,11 @@ static u32 format_size_gb(const u64 byte
static u32 true_sectors_per_clst(const struct NTFS_BOOT *boot) { - return boot->sectors_per_clusters <= 0x80 - ? boot->sectors_per_clusters - : (1u << (0 - boot->sectors_per_clusters)); + if (boot->sectors_per_clusters <= 0x80) + return boot->sectors_per_clusters; + if (boot->sectors_per_clusters >= 0xf4) /* limit shift to 2MB max */ + return 1U << (0 - boot->sectors_per_clusters); + return -EINVAL; }
/* @@ -713,6 +715,8 @@ static int ntfs_init_from_boot(struct su
/* cluster size: 512, 1K, 2K, 4K, ... 2M */ sct_per_clst = true_sectors_per_clst(boot); + if ((int)sct_per_clst < 0) + goto out; if (!is_power_of_2(sct_per_clst)) goto out;
From: Marek Maślanka mm@semihalf.com
commit 1d07cef7fd7599450b3d03e1915efc2a96e1f03f upstream.
The Google Whiskers touchpad does not work properly with the default multitouch configuration. Instead, use the same configuration as Google Rose.
Signed-off-by: Marek Maslanka mm@semihalf.com Acked-by: Benjamin Tissoires benjamin.tissoires@redhat.com Signed-off-by: Jiri Kosina jkosina@suse.cz Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/hid/hid-multitouch.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/drivers/hid/hid-multitouch.c +++ b/drivers/hid/hid-multitouch.c @@ -2176,6 +2176,9 @@ static const struct hid_device_id mt_dev { .driver_data = MT_CLS_GOOGLE, HID_DEVICE(HID_BUS_ANY, HID_GROUP_ANY, USB_VENDOR_ID_GOOGLE, USB_DEVICE_ID_GOOGLE_TOUCH_ROSE) }, + { .driver_data = MT_CLS_GOOGLE, + HID_DEVICE(BUS_USB, HID_GROUP_MULTITOUCH_WIN_8, USB_VENDOR_ID_GOOGLE, + USB_DEVICE_ID_GOOGLE_WHISKERS) },
/* Generic MT device */ { HID_DEVICE(HID_BUS_ANY, HID_GROUP_MULTITOUCH, HID_ANY_ID, HID_ANY_ID) },
From: Tao Jin tao-j@outlook.com
commit 95cd2cdc88c755dcd0a58b951faeb77742c733a4 upstream.
This applies the similar quirks used by previous generation devices such as X1 tablet for X12 tablet, so that the trackpoint and buttons can work.
This patch was applied and tested working on 5.17.1 .
Cc: stable@vger.kernel.org # 5.8+ given that it relies on 40d5bb87377a Signed-off-by: Tao Jin tao-j@outlook.com Signed-off-by: Benjamin Tissoires benjamin.tissoires@redhat.com Link: https://lore.kernel.org/r/CO6PR03MB6241CB276FCDC7F4CEDC34F6E1E29@CO6PR03MB62... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/hid/hid-ids.h | 1 + drivers/hid/hid-multitouch.c | 6 ++++++ 2 files changed, 7 insertions(+)
--- a/drivers/hid/hid-ids.h +++ b/drivers/hid/hid-ids.h @@ -753,6 +753,7 @@ #define USB_DEVICE_ID_LENOVO_X1_COVER 0x6085 #define USB_DEVICE_ID_LENOVO_X1_TAB 0x60a3 #define USB_DEVICE_ID_LENOVO_X1_TAB3 0x60b5 +#define USB_DEVICE_ID_LENOVO_X12_TAB 0x60fe #define USB_DEVICE_ID_LENOVO_OPTICAL_USB_MOUSE_600E 0x600e #define USB_DEVICE_ID_LENOVO_PIXART_USB_MOUSE_608D 0x608d #define USB_DEVICE_ID_LENOVO_PIXART_USB_MOUSE_6019 0x6019 --- a/drivers/hid/hid-multitouch.c +++ b/drivers/hid/hid-multitouch.c @@ -2032,6 +2032,12 @@ static const struct hid_device_id mt_dev USB_VENDOR_ID_LENOVO, USB_DEVICE_ID_LENOVO_X1_TAB3) },
+ /* Lenovo X12 TAB Gen 1 */ + { .driver_data = MT_CLS_WIN_8_FORCE_MULTI_INPUT, + HID_DEVICE(BUS_USB, HID_GROUP_MULTITOUCH_WIN_8, + USB_VENDOR_ID_LENOVO, + USB_DEVICE_ID_LENOVO_X12_TAB) }, + /* MosArt panels */ { .driver_data = MT_CLS_CONFIDENCE_MINUS_ONE, MT_USB_DEVICE(USB_VENDOR_ID_ASUS,
From: Reinette Chatre reinette.chatre@intel.com
commit 6bd429643cc265e94a9d19839c771bcc5d008fa8 upstream.
SGX uses shmem backing storage to store encrypted enclave pages and their crypto metadata when enclave pages are moved out of enclave memory. Two shmem backing storage pages are associated with each enclave page - one backing page to contain the encrypted enclave page data and one backing page (shared by a few enclave pages) to contain the crypto metadata used by the processor to verify the enclave page when it is loaded back into the enclave.
sgx_encl_put_backing() is used to release references to the backing storage and, optionally, mark both backing store pages as dirty.
Managing references and dirty status together in this way results in both backing store pages marked as dirty, even if only one of the backing store pages are changed.
Additionally, waiting until the page reference is dropped to set the page dirty risks a race with the page fault handler that may load outdated data into the enclave when a page is faulted right after it is reclaimed.
Consider what happens if the reclaimer writes a page to the backing store and the page is immediately faulted back, before the reclaimer is able to set the dirty bit of the page:
sgx_reclaim_pages() { sgx_vma_fault() { ... sgx_encl_get_backing(); ... ... sgx_reclaimer_write() { mutex_lock(&encl->lock); /* Write data to backing store */ mutex_unlock(&encl->lock); } mutex_lock(&encl->lock); __sgx_encl_eldu() { ... /* * Enclave backing store * page not released * nor marked dirty - * contents may not be * up to date. */ sgx_encl_get_backing(); ... /* * Enclave data restored * from backing store * and PCMD pages that * are not up to date. * ENCLS[ELDU] faults * because of MAC or PCMD * checking failure. */ sgx_encl_put_backing(); } ... /* set page dirty */ sgx_encl_put_backing(); ... mutex_unlock(&encl->lock); } }
Remove the option to sgx_encl_put_backing() to set the backing pages as dirty and set the needed pages as dirty right after receiving important data while enclave mutex is held. This ensures that the page fault handler can get up to date data from a page and prepares the code for a following change where only one of the backing pages need to be marked as dirty.
Cc: stable@vger.kernel.org Fixes: 1728ab54b4be ("x86/sgx: Add a page reclaimer") Suggested-by: Dave Hansen dave.hansen@linux.intel.com Signed-off-by: Reinette Chatre reinette.chatre@intel.com Signed-off-by: Dave Hansen dave.hansen@linux.intel.com Reviewed-by: Jarkko Sakkinen jarkko@kernel.org Tested-by: Haitao Huang haitao.huang@intel.com Link: https://lore.kernel.org/linux-sgx/8922e48f-6646-c7cc-6393-7c78dcf23d23@intel... Link: https://lkml.kernel.org/r/fa9f98986923f43e72ef4c6702a50b2a0b3c42e3.165238982... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kernel/cpu/sgx/encl.c | 10 ++-------- arch/x86/kernel/cpu/sgx/encl.h | 2 +- arch/x86/kernel/cpu/sgx/main.c | 6 ++++-- 3 files changed, 7 insertions(+), 11 deletions(-)
--- a/arch/x86/kernel/cpu/sgx/encl.c +++ b/arch/x86/kernel/cpu/sgx/encl.c @@ -94,7 +94,7 @@ static int __sgx_encl_eldu(struct sgx_en kunmap_atomic(pcmd_page); kunmap_atomic((void *)(unsigned long)pginfo.contents);
- sgx_encl_put_backing(&b, false); + sgx_encl_put_backing(&b);
sgx_encl_truncate_backing_page(encl, page_index);
@@ -645,15 +645,9 @@ int sgx_encl_get_backing(struct sgx_encl /** * sgx_encl_put_backing() - Unpin the backing storage * @backing: data for accessing backing storage for the page - * @do_write: mark pages dirty */ -void sgx_encl_put_backing(struct sgx_backing *backing, bool do_write) +void sgx_encl_put_backing(struct sgx_backing *backing) { - if (do_write) { - set_page_dirty(backing->pcmd); - set_page_dirty(backing->contents); - } - put_page(backing->pcmd); put_page(backing->contents); } --- a/arch/x86/kernel/cpu/sgx/encl.h +++ b/arch/x86/kernel/cpu/sgx/encl.h @@ -107,7 +107,7 @@ void sgx_encl_release(struct kref *ref); int sgx_encl_mm_add(struct sgx_encl *encl, struct mm_struct *mm); int sgx_encl_get_backing(struct sgx_encl *encl, unsigned long page_index, struct sgx_backing *backing); -void sgx_encl_put_backing(struct sgx_backing *backing, bool do_write); +void sgx_encl_put_backing(struct sgx_backing *backing); int sgx_encl_test_and_clear_young(struct mm_struct *mm, struct sgx_encl_page *page);
--- a/arch/x86/kernel/cpu/sgx/main.c +++ b/arch/x86/kernel/cpu/sgx/main.c @@ -170,6 +170,8 @@ static int __sgx_encl_ewb(struct sgx_epc backing->pcmd_offset;
ret = __ewb(&pginfo, sgx_get_epc_virt_addr(epc_page), va_slot); + set_page_dirty(backing->pcmd); + set_page_dirty(backing->contents);
kunmap_atomic((void *)(unsigned long)(pginfo.metadata - backing->pcmd_offset)); @@ -299,7 +301,7 @@ static void sgx_reclaimer_write(struct s sgx_encl_free_epc_page(encl->secs.epc_page); encl->secs.epc_page = NULL;
- sgx_encl_put_backing(&secs_backing, true); + sgx_encl_put_backing(&secs_backing); }
out: @@ -392,7 +394,7 @@ skip:
encl_page = epc_page->owner; sgx_reclaimer_write(epc_page, &backing[i]); - sgx_encl_put_backing(&backing[i], true); + sgx_encl_put_backing(&backing[i]);
kref_put(&encl_page->encl->refcount, sgx_encl_release); epc_page->flags &= ~SGX_EPC_PAGE_RECLAIMER_TRACKED;
From: Reinette Chatre reinette.chatre@intel.com
commit 2154e1c11b7080aa19f47160bd26b6f39bbd7824 upstream.
Recent commit 08999b2489b4 ("x86/sgx: Free backing memory after faulting the enclave page") expanded __sgx_encl_eldu() to clear an enclave page's PCMD (Paging Crypto MetaData) from the PCMD page in the backing store after the enclave page is restored to the enclave.
Since the PCMD page in the backing store is modified the page should be marked as dirty to ensure the modified data is retained.
Cc: stable@vger.kernel.org Fixes: 08999b2489b4 ("x86/sgx: Free backing memory after faulting the enclave page") Signed-off-by: Reinette Chatre reinette.chatre@intel.com Signed-off-by: Dave Hansen dave.hansen@linux.intel.com Reviewed-by: Jarkko Sakkinen jarkko@kernel.org Tested-by: Haitao Huang haitao.huang@intel.com Link: https://lkml.kernel.org/r/00cd2ac480db01058d112e347b32599c1a806bc4.165238982... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kernel/cpu/sgx/encl.c | 1 + 1 file changed, 1 insertion(+)
--- a/arch/x86/kernel/cpu/sgx/encl.c +++ b/arch/x86/kernel/cpu/sgx/encl.c @@ -84,6 +84,7 @@ static int __sgx_encl_eldu(struct sgx_en }
memset(pcmd_page + b.pcmd_offset, 0, sizeof(struct sgx_pcmd)); + set_page_dirty(b.pcmd);
/* * The area for the PCMD in the page was zeroed above. Check if the
From: Reinette Chatre reinette.chatre@intel.com
commit 0e4e729a830c1e7f31d3b3fbf8feb355a402b117 upstream.
Haitao reported encountering a WARN triggered by the ENCLS[ELDU] instruction faulting with a #GP.
The WARN is encountered when the reclaimer evicts a range of pages from the enclave when the same pages are faulted back right away.
The SGX backing storage is accessed on two paths: when there are insufficient free pages in the EPC the reclaimer works to move enclave pages to the backing storage and as enclaves access pages that have been moved to the backing storage they are retrieved from there as part of page fault handling.
An oversubscribed SGX system will often run the reclaimer and page fault handler concurrently and needs to ensure that the backing store is accessed safely between the reclaimer and the page fault handler. This is not the case because the reclaimer accesses the backing store without the enclave mutex while the page fault handler accesses the backing store with the enclave mutex.
Consider the scenario where a page is faulted while a page sharing a PCMD page with the faulted page is being reclaimed. The consequence is a race between the reclaimer and page fault handler, the reclaimer attempting to access a PCMD at the same time it is truncated by the page fault handler. This could result in lost PCMD data. Data may still be lost if the reclaimer wins the race, this is addressed in the following patch.
The reclaimer accesses pages from the backing storage without holding the enclave mutex and runs the risk of concurrently accessing the backing storage with the page fault handler that does access the backing storage with the enclave mutex held.
In the scenario below a PCMD page is truncated from the backing store after all its pages have been loaded in to the enclave at the same time the PCMD page is loaded from the backing store when one of its pages are reclaimed:
sgx_reclaim_pages() { sgx_vma_fault() { ... mutex_lock(&encl->lock); ... __sgx_encl_eldu() { ... if (pcmd_page_empty) { /* * EPC page being reclaimed /* * shares a PCMD page with an * PCMD page truncated * enclave page that is being * while requested from * faulted in. * reclaimer. */ */ sgx_encl_get_backing() <----------> sgx_encl_truncate_backing_page() } mutex_unlock(&encl->lock); } }
In this scenario there is a race between the reclaimer and the page fault handler when the reclaimer attempts to get access to the same PCMD page that is being truncated. This could result in the reclaimer writing to the PCMD page that is then truncated, causing the PCMD data to be lost, or in a new PCMD page being allocated. The lost PCMD data may still occur after protecting the backing store access with the mutex - this is fixed in the next patch. By ensuring the backing store is accessed with the mutex held the enclave page state can be made accurate with the SGX_ENCL_PAGE_BEING_RECLAIMED flag accurately reflecting that a page is in the process of being reclaimed.
Consistently protect the reclaimer's backing store access with the enclave's mutex to ensure that it can safely run concurrently with the page fault handler.
Cc: stable@vger.kernel.org Fixes: 1728ab54b4be ("x86/sgx: Add a page reclaimer") Reported-by: Haitao Huang haitao.huang@intel.com Signed-off-by: Reinette Chatre reinette.chatre@intel.com Signed-off-by: Dave Hansen dave.hansen@linux.intel.com Reviewed-by: Jarkko Sakkinen jarkko@kernel.org Tested-by: Jarkko Sakkinen jarkko@kernel.org Tested-by: Haitao Huang haitao.huang@intel.com Link: https://lkml.kernel.org/r/fa2e04c561a8555bfe1f4e7adc37d60efc77387b.165238982... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kernel/cpu/sgx/main.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-)
--- a/arch/x86/kernel/cpu/sgx/main.c +++ b/arch/x86/kernel/cpu/sgx/main.c @@ -289,6 +289,7 @@ static void sgx_reclaimer_write(struct s sgx_encl_ewb(epc_page, backing); encl_page->epc_page = NULL; encl->secs_child_cnt--; + sgx_encl_put_backing(backing);
if (!encl->secs_child_cnt && test_bit(SGX_ENCL_INITIALIZED, &encl->flags)) { ret = sgx_encl_get_backing(encl, PFN_DOWN(encl->size), @@ -362,11 +363,14 @@ static void sgx_reclaim_pages(void) goto skip;
page_index = PFN_DOWN(encl_page->desc - encl_page->encl->base); + + mutex_lock(&encl_page->encl->lock); ret = sgx_encl_get_backing(encl_page->encl, page_index, &backing[i]); - if (ret) + if (ret) { + mutex_unlock(&encl_page->encl->lock); goto skip; + }
- mutex_lock(&encl_page->encl->lock); encl_page->desc |= SGX_ENCL_PAGE_BEING_RECLAIMED; mutex_unlock(&encl_page->encl->lock); continue; @@ -394,7 +398,6 @@ skip:
encl_page = epc_page->owner; sgx_reclaimer_write(epc_page, &backing[i]); - sgx_encl_put_backing(&backing[i]);
kref_put(&encl_page->encl->refcount, sgx_encl_release); epc_page->flags &= ~SGX_EPC_PAGE_RECLAIMER_TRACKED;
From: Reinette Chatre reinette.chatre@intel.com
commit af117837ceb9a78e995804ade4726ad2c2c8981f upstream.
Haitao reported encountering a WARN triggered by the ENCLS[ELDU] instruction faulting with a #GP.
The WARN is encountered when the reclaimer evicts a range of pages from the enclave when the same pages are faulted back right away.
Consider two enclave pages (ENCLAVE_A and ENCLAVE_B) sharing a PCMD page (PCMD_AB). ENCLAVE_A is in the enclave memory and ENCLAVE_B is in the backing store. PCMD_AB contains just one entry, that of ENCLAVE_B.
Scenario proceeds where ENCLAVE_A is being evicted from the enclave while ENCLAVE_B is faulted in.
sgx_reclaim_pages() {
...
/* * Reclaim ENCLAVE_A */ mutex_lock(&encl->lock); /* * Get a reference to ENCLAVE_A's * shmem page where enclave page * encrypted data will be stored * as well as a reference to the * enclave page's PCMD data page, * PCMD_AB. * Release mutex before writing * any data to the shmem pages. */ sgx_encl_get_backing(...); encl_page->desc |= SGX_ENCL_PAGE_BEING_RECLAIMED; mutex_unlock(&encl->lock);
/* * Fault ENCLAVE_B */
sgx_vma_fault() {
mutex_lock(&encl->lock); /* * Get reference to * ENCLAVE_B's shmem page * as well as PCMD_AB. */ sgx_encl_get_backing(...) /* * Load page back into * enclave via ELDU. */ /* * Release reference to * ENCLAVE_B' shmem page and * PCMD_AB. */ sgx_encl_put_backing(...); /* * PCMD_AB is found empty so * it and ENCLAVE_B's shmem page * are truncated. */ /* Truncate ENCLAVE_B backing page */ sgx_encl_truncate_backing_page(); /* Truncate PCMD_AB */ sgx_encl_truncate_backing_page();
mutex_unlock(&encl->lock);
... } mutex_lock(&encl->lock); encl_page->desc &= ~SGX_ENCL_PAGE_BEING_RECLAIMED; /* * Write encrypted contents of * ENCLAVE_A to ENCLAVE_A shmem * page and its PCMD data to * PCMD_AB. */ sgx_encl_put_backing(...)
/* * Reference to PCMD_AB is * dropped and it is truncated. * ENCLAVE_A's PCMD data is lost. */ mutex_unlock(&encl->lock); }
What happens next depends on whether it is ENCLAVE_A being faulted in or ENCLAVE_B being evicted - but both end up with ENCLS[ELDU] faulting with a #GP.
If ENCLAVE_A is faulted then at the time sgx_encl_get_backing() is called a new PCMD page is allocated and providing the empty PCMD data for ENCLAVE_A would cause ENCLS[ELDU] to #GP
If ENCLAVE_B is evicted first then a new PCMD_AB would be allocated by the reclaimer but later when ENCLAVE_A is faulted the ENCLS[ELDU] instruction would #GP during its checks of the PCMD value and the WARN would be encountered.
Noting that the reclaimer sets SGX_ENCL_PAGE_BEING_RECLAIMED at the time it obtains a reference to the backing store pages of an enclave page it is in the process of reclaiming, fix the race by only truncating the PCMD page after ensuring that no page sharing the PCMD page is in the process of being reclaimed.
Cc: stable@vger.kernel.org Fixes: 08999b2489b4 ("x86/sgx: Free backing memory after faulting the enclave page") Reported-by: Haitao Huang haitao.huang@intel.com Signed-off-by: Reinette Chatre reinette.chatre@intel.com Signed-off-by: Dave Hansen dave.hansen@linux.intel.com Reviewed-by: Jarkko Sakkinen jarkko@kernel.org Tested-by: Haitao Huang haitao.huang@intel.com Link: https://lkml.kernel.org/r/ed20a5db516aa813873268e125680041ae11dfcf.165238982... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kernel/cpu/sgx/encl.c | 94 ++++++++++++++++++++++++++++++++++++++++- 1 file changed, 93 insertions(+), 1 deletion(-)
--- a/arch/x86/kernel/cpu/sgx/encl.c +++ b/arch/x86/kernel/cpu/sgx/encl.c @@ -12,6 +12,92 @@ #include "encls.h" #include "sgx.h"
+#define PCMDS_PER_PAGE (PAGE_SIZE / sizeof(struct sgx_pcmd)) +/* + * 32 PCMD entries share a PCMD page. PCMD_FIRST_MASK is used to + * determine the page index associated with the first PCMD entry + * within a PCMD page. + */ +#define PCMD_FIRST_MASK GENMASK(4, 0) + +/** + * reclaimer_writing_to_pcmd() - Query if any enclave page associated with + * a PCMD page is in process of being reclaimed. + * @encl: Enclave to which PCMD page belongs + * @start_addr: Address of enclave page using first entry within the PCMD page + * + * When an enclave page is reclaimed some Paging Crypto MetaData (PCMD) is + * stored. The PCMD data of a reclaimed enclave page contains enough + * information for the processor to verify the page at the time + * it is loaded back into the Enclave Page Cache (EPC). + * + * The backing storage to which enclave pages are reclaimed is laid out as + * follows: + * Encrypted enclave pages:SECS page:PCMD pages + * + * Each PCMD page contains the PCMD metadata of + * PAGE_SIZE/sizeof(struct sgx_pcmd) enclave pages. + * + * A PCMD page can only be truncated if it is (a) empty, and (b) not in the + * process of getting data (and thus soon being non-empty). (b) is tested with + * a check if an enclave page sharing the PCMD page is in the process of being + * reclaimed. + * + * The reclaimer sets the SGX_ENCL_PAGE_BEING_RECLAIMED flag when it + * intends to reclaim that enclave page - it means that the PCMD page + * associated with that enclave page is about to get some data and thus + * even if the PCMD page is empty, it should not be truncated. + * + * Context: Enclave mutex (&sgx_encl->lock) must be held. + * Return: 1 if the reclaimer is about to write to the PCMD page + * 0 if the reclaimer has no intention to write to the PCMD page + */ +static int reclaimer_writing_to_pcmd(struct sgx_encl *encl, + unsigned long start_addr) +{ + int reclaimed = 0; + int i; + + /* + * PCMD_FIRST_MASK is based on number of PCMD entries within + * PCMD page being 32. + */ + BUILD_BUG_ON(PCMDS_PER_PAGE != 32); + + for (i = 0; i < PCMDS_PER_PAGE; i++) { + struct sgx_encl_page *entry; + unsigned long addr; + + addr = start_addr + i * PAGE_SIZE; + + /* + * Stop when reaching the SECS page - it does not + * have a page_array entry and its reclaim is + * started and completed with enclave mutex held so + * it does not use the SGX_ENCL_PAGE_BEING_RECLAIMED + * flag. + */ + if (addr == encl->base + encl->size) + break; + + entry = xa_load(&encl->page_array, PFN_DOWN(addr)); + if (!entry) + continue; + + /* + * VA page slot ID uses same bit as the flag so it is important + * to ensure that the page is not already in backing store. + */ + if (entry->epc_page && + (entry->desc & SGX_ENCL_PAGE_BEING_RECLAIMED)) { + reclaimed = 1; + break; + } + } + + return reclaimed; +} + /* * Calculate byte offset of a PCMD struct associated with an enclave page. PCMD's * follow right after the EPC data in the backing storage. In addition to the @@ -47,6 +133,7 @@ static int __sgx_encl_eldu(struct sgx_en unsigned long va_offset = encl_page->desc & SGX_ENCL_PAGE_VA_OFFSET_MASK; struct sgx_encl *encl = encl_page->encl; pgoff_t page_index, page_pcmd_off; + unsigned long pcmd_first_page; struct sgx_pageinfo pginfo; struct sgx_backing b; bool pcmd_page_empty; @@ -58,6 +145,11 @@ static int __sgx_encl_eldu(struct sgx_en else page_index = PFN_DOWN(encl->size);
+ /* + * Address of enclave page using the first entry within the PCMD page. + */ + pcmd_first_page = PFN_PHYS(page_index & ~PCMD_FIRST_MASK) + encl->base; + page_pcmd_off = sgx_encl_get_backing_page_pcmd_offset(encl, page_index);
ret = sgx_encl_get_backing(encl, page_index, &b); @@ -99,7 +191,7 @@ static int __sgx_encl_eldu(struct sgx_en
sgx_encl_truncate_backing_page(encl, page_index);
- if (pcmd_page_empty) + if (pcmd_page_empty && !reclaimer_writing_to_pcmd(encl, pcmd_first_page)) sgx_encl_truncate_backing_page(encl, PFN_DOWN(page_pcmd_off));
return ret;
From: Reinette Chatre reinette.chatre@intel.com
commit e3a3bbe3e99de73043a1d32d36cf4d211dc58c7e upstream.
A PCMD (Paging Crypto MetaData) page contains the PCMD structures of enclave pages that have been encrypted and moved to the shmem backing store. When all enclave pages sharing a PCMD page are loaded in the enclave, there is no need for the PCMD page and it can be truncated from the backing store.
A few issues appeared around the truncation of PCMD pages. The known issues have been addressed but the PCMD handling code could be made more robust by loudly complaining if any new issue appears in this area.
Add a check that will complain with a warning if the PCMD page is not actually empty after it has been truncated. There should never be data in the PCMD page at this point since it is was just checked to be empty and truncated with enclave mutex held and is updated with the enclave mutex held.
Suggested-by: Dave Hansen dave.hansen@linux.intel.com Signed-off-by: Reinette Chatre reinette.chatre@intel.com Signed-off-by: Dave Hansen dave.hansen@linux.intel.com Reviewed-by: Jarkko Sakkinen jarkko@kernel.org Tested-by: Haitao Huang haitao.huang@intel.com Link: https://lkml.kernel.org/r/6495120fed43fafc1496d09dd23df922b9a32709.165238982... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kernel/cpu/sgx/encl.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-)
--- a/arch/x86/kernel/cpu/sgx/encl.c +++ b/arch/x86/kernel/cpu/sgx/encl.c @@ -187,12 +187,20 @@ static int __sgx_encl_eldu(struct sgx_en kunmap_atomic(pcmd_page); kunmap_atomic((void *)(unsigned long)pginfo.contents);
+ get_page(b.pcmd); sgx_encl_put_backing(&b);
sgx_encl_truncate_backing_page(encl, page_index);
- if (pcmd_page_empty && !reclaimer_writing_to_pcmd(encl, pcmd_first_page)) + if (pcmd_page_empty && !reclaimer_writing_to_pcmd(encl, pcmd_first_page)) { sgx_encl_truncate_backing_page(encl, PFN_DOWN(page_pcmd_off)); + pcmd_page = kmap_atomic(b.pcmd); + if (memchr_inv(pcmd_page, 0, PAGE_SIZE)) + pr_warn("PCMD page not empty after truncate.\n"); + kunmap_atomic(pcmd_page); + } + + put_page(b.pcmd);
return ret; }
From: Bryan O'Donoghue bryan.odonoghue@linaro.org
commit bb25f071fc92d3d227178a45853347c7b3b45a6b upstream.
The imx412/imx577 sensor has a reset line that is active low not active high. Currently the logic for this is inverted.
The right way to define the reset line is to declare it active low in the DTS and invert the logic currently contained in the driver.
The DTS should represent the hardware does i.e. reset is active low. So: + reset-gpios = <&tlmm 78 GPIO_ACTIVE_LOW>; not: - reset-gpios = <&tlmm 78 GPIO_ACTIVE_HIGH>;
I was a bit reticent about changing this logic since I thought it might negatively impact @intel.com users. Googling a bit though I believe this sensor is used on "Keem Bay" which is clearly a DTS based system and is not upstream yet.
Fixes: 9214e86c0cc1 ("media: i2c: Add imx412 camera sensor driver") Cc: stable@vger.kernel.org Signed-off-by: Bryan O'Donoghue bryan.odonoghue@linaro.org Reviewed-by: Jacopo Mondi jacopo@jmondi.org Reviewed-by: Daniele Alessandrelli daniele.alessandrelli@intel.com Signed-off-by: Sakari Ailus sakari.ailus@linux.intel.com Signed-off-by: Mauro Carvalho Chehab mchehab@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/media/i2c/imx412.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
--- a/drivers/media/i2c/imx412.c +++ b/drivers/media/i2c/imx412.c @@ -1011,7 +1011,7 @@ static int imx412_power_on(struct device struct imx412 *imx412 = to_imx412(sd); int ret;
- gpiod_set_value_cansleep(imx412->reset_gpio, 1); + gpiod_set_value_cansleep(imx412->reset_gpio, 0);
ret = clk_prepare_enable(imx412->inclk); if (ret) { @@ -1024,7 +1024,7 @@ static int imx412_power_on(struct device return 0;
error_reset: - gpiod_set_value_cansleep(imx412->reset_gpio, 0); + gpiod_set_value_cansleep(imx412->reset_gpio, 1);
return ret; } @@ -1040,7 +1040,7 @@ static int imx412_power_off(struct devic struct v4l2_subdev *sd = dev_get_drvdata(dev); struct imx412 *imx412 = to_imx412(sd);
- gpiod_set_value_cansleep(imx412->reset_gpio, 0); + gpiod_set_value_cansleep(imx412->reset_gpio, 1);
clk_disable_unprepare(imx412->inclk);
From: Bryan O'Donoghue bryan.odonoghue@linaro.org
commit 9a199694c6a1519522ec73a4571f68abe9f13d5d upstream.
The enable path does - gpio - clock
The disable path does - gpio - clock
Fix the order on the power-off path so that power-off and power-on have the same ordering for clock and gpio.
Fixes: 9214e86c0cc1 ("media: i2c: Add imx412 camera sensor driver") Cc: stable@vger.kernel.org Signed-off-by: Bryan O'Donoghue bryan.odonoghue@linaro.org Reviewed-by: Jacopo Mondi jacopo@jmondi.org Reviewed-by: Daniele Alessandrelli daniele.alessandrelli@intel.com Signed-off-by: Sakari Ailus sakari.ailus@linux.intel.com Signed-off-by: Mauro Carvalho Chehab mchehab@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/media/i2c/imx412.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/media/i2c/imx412.c +++ b/drivers/media/i2c/imx412.c @@ -1040,10 +1040,10 @@ static int imx412_power_off(struct devic struct v4l2_subdev *sd = dev_get_drvdata(dev); struct imx412 *imx412 = to_imx412(sd);
- gpiod_set_value_cansleep(imx412->reset_gpio, 1); - clk_disable_unprepare(imx412->inclk);
+ gpiod_set_value_cansleep(imx412->reset_gpio, 1); + return 0; }
From: Stefan Mahnke-Hartmann stefan.mahnke-hartmann@infineon.com
commit e57b2523bd37e6434f4e64c7a685e3715ad21e9a upstream.
Under certain conditions uninitialized memory will be accessed. As described by TCG Trusted Platform Module Library Specification, rev. 1.59 (Part 3: Commands), if a TPM2_GetCapability is received, requesting a capability, the TPM in field upgrade mode may return a zero length list. Check the property count in tpm2_get_tpm_pt().
Fixes: 2ab3241161b3 ("tpm: migrate tpm2_get_tpm_pt() to use struct tpm_buf") Cc: stable@vger.kernel.org Signed-off-by: Stefan Mahnke-Hartmann stefan.mahnke-hartmann@infineon.com Reviewed-by: Jarkko Sakkinen jarkko@kernel.org Signed-off-by: Jarkko Sakkinen jarkko@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/char/tpm/tpm2-cmd.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-)
--- a/drivers/char/tpm/tpm2-cmd.c +++ b/drivers/char/tpm/tpm2-cmd.c @@ -400,7 +400,16 @@ ssize_t tpm2_get_tpm_pt(struct tpm_chip if (!rc) { out = (struct tpm2_get_cap_out *) &buf.data[TPM_HEADER_SIZE]; - *value = be32_to_cpu(out->value); + /* + * To prevent failing boot up of some systems, Infineon TPM2.0 + * returns SUCCESS on TPM2_Startup in field upgrade mode. Also + * the TPM2_Getcapability command returns a zero length list + * in field upgrade mode. + */ + if (be32_to_cpu(out->property_cnt) > 0) + *value = be32_to_cpu(out->value); + else + rc = -ENODATA; } tpm_buf_destroy(&buf); return rc;
From: Xiu Jianfeng xiujianfeng@huawei.com
commit d0dc1a7100f19121f6e7450f9cdda11926aa3838 upstream.
Currently it returns zero when CRQ response timed out, it should return an error code instead.
Fixes: d8d74ea3c002 ("tpm: ibmvtpm: Wait for buffer to be set before proceeding") Signed-off-by: Xiu Jianfeng xiujianfeng@huawei.com Reviewed-by: Stefan Berger stefanb@linux.ibm.com Acked-by: Jarkko Sakkinen jarkko@kernel.org Signed-off-by: Jarkko Sakkinen jarkko@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/char/tpm/tpm_ibmvtpm.c | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/char/tpm/tpm_ibmvtpm.c +++ b/drivers/char/tpm/tpm_ibmvtpm.c @@ -681,6 +681,7 @@ static int tpm_ibmvtpm_probe(struct vio_ if (!wait_event_timeout(ibmvtpm->crq_queue.wq, ibmvtpm->rtce_buf != NULL, HZ)) { + rc = -ENODEV; dev_err(dev, "CRQ response timed out\n"); goto init_irq_cleanup; }
From: Akira Yokosawa akiyks@gmail.com
commit 6d5aa418b3bd42cdccc36e94ee199af423ef7c84 upstream.
The reference to `explicit_in_reply_to` is pointless as when the reference was added in the form of "#15" [1], Section 15) was "The canonical patch format". The reference of "#15" had not been properly updated in a couple of reorganizations during the plain-text SubmittingPatches era.
Fix it by using `the_canonical_patch_format`.
[1]: 2ae19acaa50a ("Documentation: Add "how to write a good patch summary" to SubmittingPatches")
Signed-off-by: Akira Yokosawa akiyks@gmail.com Fixes: 5903019b2a5e ("Documentation/SubmittingPatches: convert it to ReST markup") Fixes: 9b2c76777acc ("Documentation/SubmittingPatches: enrich the Sphinx output") Cc: Jonathan Corbet corbet@lwn.net Cc: Mauro Carvalho Chehab mchehab@kernel.org Cc: stable@vger.kernel.org # v4.9+ Link: https://lore.kernel.org/r/64e105a5-50be-23f2-6cae-903a2ea98e18@gmail.com Signed-off-by: Jonathan Corbet corbet@lwn.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- Documentation/process/submitting-patches.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/Documentation/process/submitting-patches.rst +++ b/Documentation/process/submitting-patches.rst @@ -72,7 +72,7 @@ as you intend it to.
The maintainer will thank you if you write your patch description in a form which can be easily pulled into Linux's source code management -system, ``git``, as a "commit log". See :ref:`explicit_in_reply_to`. +system, ``git``, as a "commit log". See :ref:`the_canonical_patch_format`.
Solve only one problem per patch. If your description starts to get long, that's a sign that you probably need to split up your patch.
From: Trond Myklebust trond.myklebust@hammerspace.com
commit 452284407c18d8a522c3039339b1860afa0025a8 upstream.
We need to filter out ENOMEM in nfs_error_is_fatal_on_server(), because running out of memory on our client is not a server error.
Reported-by: Olga Kornievskaia aglo@umich.edu Fixes: 2dc23afffbca ("NFS: ENOMEM should also be a fatal error.") Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Anna Schumaker Anna.Schumaker@Netapp.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/nfs/internal.h | 1 + 1 file changed, 1 insertion(+)
--- a/fs/nfs/internal.h +++ b/fs/nfs/internal.h @@ -834,6 +834,7 @@ static inline bool nfs_error_is_fatal_on case 0: case -ERESTARTSYS: case -EINTR: + case -ENOMEM: return false; } return nfs_error_is_fatal(err);
From: Chuck Lever chuck.lever@oracle.com
commit ce3c4ad7f4ce5db7b4f08a1e237d8dd94b39180b upstream.
nfsd4_release_lockowner() holds clp->cl_lock when it calls check_for_locks(). However, check_for_locks() calls nfsd_file_get() / nfsd_file_put() to access the backing inode's flc_posix list, and nfsd_file_put() can sleep if the inode was recently removed.
Let's instead rely on the stateowner's reference count to gate whether the release is permitted. This should be a reliable indication of locks-in-use since file lock operations and ->lm_get_owner take appropriate references, which are released appropriately when file locks are removed.
Reported-by: Dai Ngo dai.ngo@oracle.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/nfsd/nfs4state.c | 12 ++++-------- 1 file changed, 4 insertions(+), 8 deletions(-)
--- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -7299,16 +7299,12 @@ nfsd4_release_lockowner(struct svc_rqst if (sop->so_is_open_owner || !same_owner_str(sop, owner)) continue;
- /* see if there are still any locks associated with it */ - lo = lockowner(sop); - list_for_each_entry(stp, &sop->so_stateids, st_perstateowner) { - if (check_for_locks(stp->st_stid.sc_file, lo)) { - status = nfserr_locks_held; - spin_unlock(&clp->cl_lock); - return status; - } + if (atomic_read(&sop->so_count) != 1) { + spin_unlock(&clp->cl_lock); + return nfserr_locks_held; }
+ lo = lockowner(sop); nfs4_get_stateowner(sop); break; }
From: Yuntao Wang ytcoode@gmail.com
commit a2aa95b71c9bbec793b5c5fa50f0a80d882b3e8d upstream.
The cnt value in the 'cnt >= BPF_MAX_TRAMP_PROGS' check does not include BPF_TRAMP_MODIFY_RETURN bpf programs, so the number of the attached BPF_TRAMP_MODIFY_RETURN bpf programs in a trampoline can exceed BPF_MAX_TRAMP_PROGS.
When this happens, the assignment '*progs++ = aux->prog' in bpf_trampoline_get_progs() will cause progs array overflow as the progs field in the bpf_tramp_progs struct can only hold at most BPF_MAX_TRAMP_PROGS bpf programs.
Fixes: 88fd9e5352fe ("bpf: Refactor trampoline update code") Signed-off-by: Yuntao Wang ytcoode@gmail.com Link: https://lore.kernel.org/r/20220430130803.210624-1-ytcoode@gmail.com Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/bpf/trampoline.c | 18 ++++++++++++------ 1 file changed, 12 insertions(+), 6 deletions(-)
--- a/kernel/bpf/trampoline.c +++ b/kernel/bpf/trampoline.c @@ -414,7 +414,7 @@ int bpf_trampoline_link_prog(struct bpf_ { enum bpf_tramp_prog_type kind; int err = 0; - int cnt; + int cnt = 0, i;
kind = bpf_attach_type_to_tramp(prog); mutex_lock(&tr->mutex); @@ -425,7 +425,10 @@ int bpf_trampoline_link_prog(struct bpf_ err = -EBUSY; goto out; } - cnt = tr->progs_cnt[BPF_TRAMP_FENTRY] + tr->progs_cnt[BPF_TRAMP_FEXIT]; + + for (i = 0; i < BPF_TRAMP_MAX; i++) + cnt += tr->progs_cnt[i]; + if (kind == BPF_TRAMP_REPLACE) { /* Cannot attach extension if fentry/fexit are in use. */ if (cnt) { @@ -503,16 +506,19 @@ out:
void bpf_trampoline_put(struct bpf_trampoline *tr) { + int i; + if (!tr) return; mutex_lock(&trampoline_mutex); if (!refcount_dec_and_test(&tr->refcnt)) goto out; WARN_ON_ONCE(mutex_is_locked(&tr->mutex)); - if (WARN_ON_ONCE(!hlist_empty(&tr->progs_hlist[BPF_TRAMP_FENTRY]))) - goto out; - if (WARN_ON_ONCE(!hlist_empty(&tr->progs_hlist[BPF_TRAMP_FEXIT]))) - goto out; + + for (i = 0; i < BPF_TRAMP_MAX; i++) + if (WARN_ON_ONCE(!hlist_empty(&tr->progs_hlist[i]))) + goto out; + /* This code will be executed even when the last bpf_tramp_image * is alive. All progs are detached from the trampoline and the * trampoline image is patched with jmp into epilogue to skip
From: Liu Jian liujian56@huawei.com
commit 45969b4152c1752089351cd6836a42a566d49bcf upstream.
The data length of skb frags + frag_list may be greater than 0xffff, and skb_header_pointer can not handle negative offset. So, here INT_MAX is used to check the validity of offset. Add the same change to the related function skb_store_bytes.
Fixes: 05c74e5e53f6 ("bpf: add bpf_skb_load_bytes helper") Signed-off-by: Liu Jian liujian56@huawei.com Signed-off-by: Daniel Borkmann daniel@iogearbox.net Acked-by: Song Liu songliubraving@fb.com Link: https://lore.kernel.org/bpf/20220416105801.88708-2-liujian56@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/core/filter.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/net/core/filter.c +++ b/net/core/filter.c @@ -1688,7 +1688,7 @@ BPF_CALL_5(bpf_skb_store_bytes, struct s
if (unlikely(flags & ~(BPF_F_RECOMPUTE_CSUM | BPF_F_INVALIDATE_HASH))) return -EINVAL; - if (unlikely(offset > 0xffff)) + if (unlikely(offset > INT_MAX)) return -EFAULT; if (unlikely(bpf_try_make_writable(skb, offset + len))) return -EFAULT; @@ -1723,7 +1723,7 @@ BPF_CALL_4(bpf_skb_load_bytes, const str { void *ptr;
- if (unlikely(offset > 0xffff)) + if (unlikely(offset > INT_MAX)) goto err_clear;
ptr = skb_header_pointer(skb, offset, len, to);
From: Yuntao Wang ytcoode@gmail.com
commit b45043192b3e481304062938a6561da2ceea46a6 upstream.
The 'n_buckets * (value_size + sizeof(struct stack_map_bucket))' part of the allocated memory for 'smap' is never used after the memlock accounting was removed, thus get rid of it.
[ Note, Daniel:
Commit b936ca643ade ("bpf: rework memlock-based memory accounting for maps") moved `cost += n_buckets * (value_size + sizeof(struct stack_map_bucket))` up and therefore before the bpf_map_area_alloc() allocation, sigh. In a later step commit c85d69135a91 ("bpf: move memory size checks to bpf_map_charge_init()"), and the overflow checks of `cost >= U32_MAX - PAGE_SIZE` moved into bpf_map_charge_init(). And then 370868107bf6 ("bpf: Eliminate rlimit-based memory accounting for stackmap maps") finally removed the bpf_map_charge_init(). Anyway, the original code did the allocation same way as /after/ this fix. ]
Fixes: b936ca643ade ("bpf: rework memlock-based memory accounting for maps") Signed-off-by: Yuntao Wang ytcoode@gmail.com Signed-off-by: Daniel Borkmann daniel@iogearbox.net Link: https://lore.kernel.org/bpf/20220407130423.798386-1-ytcoode@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/bpf/stackmap.c | 1 - 1 file changed, 1 deletion(-)
--- a/kernel/bpf/stackmap.c +++ b/kernel/bpf/stackmap.c @@ -119,7 +119,6 @@ static struct bpf_map *stack_map_alloc(u return ERR_PTR(-E2BIG);
cost = n_buckets * sizeof(struct stack_map_bucket *) + sizeof(*smap); - cost += n_buckets * (value_size + sizeof(struct stack_map_bucket)); smap = bpf_map_area_alloc(cost, bpf_map_attr_numa_node(attr)); if (!smap) return ERR_PTR(-ENOMEM);
From: Kumar Kartikeya Dwivedi memxor@gmail.com
commit 7b3552d3f9f6897851fc453b5131a967167e43c2 upstream.
It is not permitted to write to PTR_TO_MAP_KEY, but the current code in check_helper_mem_access would allow for it, reject this case as well, as helpers taking ARG_PTR_TO_UNINIT_MEM also take PTR_TO_MAP_KEY.
Fixes: 69c087ba6225 ("bpf: Add bpf_for_each_map_elem() helper") Signed-off-by: Kumar Kartikeya Dwivedi memxor@gmail.com Link: https://lore.kernel.org/r/20220319080827.73251-4-memxor@gmail.com Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/bpf/verifier.c | 5 +++++ 1 file changed, 5 insertions(+)
--- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -4587,6 +4587,11 @@ static int check_helper_mem_access(struc return check_packet_access(env, regno, reg->off, access_size, zero_size_allowed); case PTR_TO_MAP_KEY: + if (meta && meta->raw_mode) { + verbose(env, "R%d cannot write into %s\n", regno, + reg_type_str(env, reg->type)); + return -EACCES; + } return check_mem_region_access(env, regno, reg->off, access_size, reg->map_ptr->key_size, false); case PTR_TO_MAP_VALUE:
From: Kumar Kartikeya Dwivedi memxor@gmail.com
commit 97e6d7dab1ca4648821c790a2b7913d6d5d549db upstream.
The commit being fixed was aiming to disallow users from incorrectly obtaining writable pointer to memory that is only meant to be read. This is enforced now using a MEM_RDONLY flag.
For instance, in case of global percpu variables, when the BTF type is not struct (e.g. bpf_prog_active), the verifier marks register type as PTR_TO_MEM | MEM_RDONLY from bpf_this_cpu_ptr or bpf_per_cpu_ptr helpers. However, when passing such pointer to kfunc, global funcs, or BPF helpers, in check_helper_mem_access, there is no expectation MEM_RDONLY flag will be set, hence it is checked as pointer to writable memory. Later, verifier sets up argument type of global func as PTR_TO_MEM | PTR_MAYBE_NULL, so user can use a global func to get around the limitations imposed by this flag.
This check will also cover global non-percpu variables that may be introduced in kernel BTF in future.
Also, we update the log message for PTR_TO_BUF case to be similar to PTR_TO_MEM case, so that the reason for error is clear to user.
Fixes: 34d3a78c681e ("bpf: Make per_cpu_ptr return rdonly PTR_TO_MEM.") Reviewed-by: Hao Luo haoluo@google.com Signed-off-by: Kumar Kartikeya Dwivedi memxor@gmail.com Link: https://lore.kernel.org/r/20220319080827.73251-3-memxor@gmail.com Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/bpf/verifier.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-)
--- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -4602,13 +4602,23 @@ static int check_helper_mem_access(struc return check_map_access(env, regno, reg->off, access_size, zero_size_allowed); case PTR_TO_MEM: + if (type_is_rdonly_mem(reg->type)) { + if (meta && meta->raw_mode) { + verbose(env, "R%d cannot write into %s\n", regno, + reg_type_str(env, reg->type)); + return -EACCES; + } + } return check_mem_region_access(env, regno, reg->off, access_size, reg->mem_size, zero_size_allowed); case PTR_TO_BUF: if (type_is_rdonly_mem(reg->type)) { - if (meta && meta->raw_mode) + if (meta && meta->raw_mode) { + verbose(env, "R%d cannot write into %s\n", regno, + reg_type_str(env, reg->type)); return -EACCES; + }
buf_info = "rdonly"; max_access = &env->prog->aux->max_rdonly_access;
On Fri, 3 Jun 2022 19:42:40 +0200, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 5.15.45 release. There are 66 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sun, 05 Jun 2022 17:38:05 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.15.45-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.15.y and the diffstat can be found below.
thanks,
greg k-h
5.15.45-rc1 Successfully Compiled and booted on my Raspberry PI 4b (8g) (bcm2711)
Tested-by: Fox Chen foxhlchen@gmail.com
Hi Greg,
On Fri, Jun 03, 2022 at 07:42:40PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.15.45 release. There are 66 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sun, 05 Jun 2022 17:38:05 +0000. Anything received after that time might be too late.
Build test: mips (gcc version 11.3.1 20220531): 62 configs -> no failure arm (gcc version 11.3.1 20220531): 99 configs -> no failure arm64 (gcc version 11.3.1 20220531): 3 configs -> no failure x86_64 (gcc version 11.3.1 20220531): 4 configs -> no failure
Boot test: x86_64: Booted on my test laptop. No regression. x86_64: Booted on qemu. No regression. [1] arm64: Booted on rpi4b (4GB model). No regression. [2] mips: Booted on ci20 board. No regression. [3]
[1]. https://openqa.qa.codethink.co.uk/tests/1262 [2]. https://openqa.qa.codethink.co.uk/tests/1268 [3]. https://openqa.qa.codethink.co.uk/tests/1269
Tested-by: Sudip Mukherjee sudip.mukherjee@codethink.co.uk
-- Regards Sudip
On Fri, 3 Jun 2022 at 23:20, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 5.15.45 release. There are 66 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sun, 05 Jun 2022 17:38:05 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.15.45-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.15.y and the diffstat can be found below.
thanks,
greg k-h
Results from Linaro’s test farm. No regressions on arm64, arm, x86_64, and i386.
Tested-by: Linux Kernel Functional Testing lkft@linaro.org
## Build * kernel: 5.15.45-rc1 * git: https://gitlab.com/Linaro/lkft/mirrors/stable/linux-stable-rc * git branch: linux-5.15.y * git commit: c5d3dc2dfc027824f60e0e6913265b16296eceec * git describe: v5.15.43-213-gc5d3dc2dfc02 * test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-5.15.y/build/v5.15....
## Test Regressions (compared to v5.15.43-146-g510d04da560c) No test regressions found.
## Metric Regressions (compared to v5.15.43-146-g510d04da560c) No metric regressions found.
## Test Fixes (compared to v5.15.43-146-g510d04da560c) No test fixes found.
## Metric Fixes (compared to v5.15.43-146-g510d04da560c) No metric fixes found.
## Test result summary total: 135772, pass: 123116, fail: 230, skip: 11846, xfail: 580
## Build Summary * arc: 10 total, 10 passed, 0 failed * arm: 314 total, 314 passed, 0 failed * arm64: 58 total, 58 passed, 0 failed * i386: 52 total, 49 passed, 3 failed * mips: 37 total, 37 passed, 0 failed * parisc: 12 total, 12 passed, 0 failed * powerpc: 54 total, 54 passed, 0 failed * riscv: 22 total, 22 passed, 0 failed * s390: 21 total, 21 passed, 0 failed * sh: 24 total, 24 passed, 0 failed * sparc: 12 total, 12 passed, 0 failed * x86_64: 56 total, 55 passed, 1 failed
## Test suites summary * fwts * kunit * kvm-unit-tests * libgpiod * libhugetlbfs * log-parser-boot * log-parser-test * ltp-cap_bounds * ltp-cap_bounds-tests * ltp-commands * ltp-commands-tests * ltp-containers * ltp-containers-tests * ltp-controllers-tests * ltp-cpuhotplug-tests * ltp-crypto * ltp-crypto-tests * ltp-cve-tests * ltp-dio-tests * ltp-fcntl-locktests * ltp-fcntl-locktests-tests * ltp-filecaps * ltp-filecaps-tests * ltp-fs * ltp-fs-tests * ltp-fs_bind * ltp-fs_bind-tests * ltp-fs_perms_simple * ltp-fs_perms_simple-tests * ltp-fsx * ltp-fsx-tests * ltp-hugetlb * ltp-hugetlb-tests * ltp-io * ltp-io-tests * ltp-ipc * ltp-ipc-tests * ltp-math-tests * ltp-mm-tests * ltp-nptl * ltp-nptl-tests * ltp-open-posix-tests * ltp-pty * ltp-pty-tests * ltp-sched-tests * ltp-securebits * ltp-securebits-tests * ltp-syscalls-tests * ltp-tracing-tests * network-basic-tests * packetdrill * perf * perf/Zstd-perf.data-compression * rcutorture * ssuite * v4l2-compliance * vdso
-- Linaro LKFT https://lkft.linaro.org
On Fri, Jun 03, 2022 at 07:42:40PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.15.45 release. There are 66 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sun, 05 Jun 2022 17:38:05 +0000. Anything received after that time might be too late.
Build results: total: 157 pass: 157 fail: 0 Qemu test results: total: 488 pass: 488 fail: 0
Tested-by: Guenter Roeck linux@roeck-us.net
Guenter
On Fri, Jun 03, 2022 at 07:42:40PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.15.45 release. There are 66 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Successfully cross-compiled for arm (multi_v7_defconfig, GCC 12.1.0, neon FPU) and arm64 (bcm2711_defconfig, GCC 12.1.0).
Tested-by: Bagas Sanjaya bagasdotme@gmail.com
On 6/3/22 10:42 AM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.15.45 release. There are 66 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sun, 05 Jun 2022 17:38:05 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.15.45-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.15.y and the diffstat can be found below.
thanks,
greg k-h
Built and booted successfully on RISC-V RV64 (HiFive Unmatched).
Tested-by: Ron Economos re@w6rz.net
linux-stable-mirror@lists.linaro.org