The virtio specification virtio-v1.1-cs01 states: "Transitional devices
MUST detect Legacy drivers by detecting that VIRTIO_F_VERSION_1 has not
been acknowledged by the driver." This is exactly what QEMU as of 6.1
has done relying solely on VIRTIO_F_VERSION_1 for detecting that.
However, the specification also says: "... the driver MAY read (but MUST
NOT write) the device-specific configuration fields to check that it can
support the device ..." before setting FEATURES_OK.
In that case, any transitional device relying solely on
VIRTIO_F_VERSION_1 for detecting legacy drivers will return data in
legacy format. In particular, this implies that it is in big endian
format for big endian guests. This naturally confuses the driver which
expects little endian in the modern mode.
It is probably a good idea to amend the spec to clarify that
VIRTIO_F_VERSION_1 can only be relied on after the feature negotiation
is complete. Before validate callback existed, config space was only
read after FEATURES_OK. However, we already have two regressions, so
let's address this here as well.
The regressions affect the VIRTIO_NET_F_MTU feature of virtio-net and
the VIRTIO_BLK_F_BLK_SIZE feature of virtio-blk for BE guests when
virtio 1.0 is used on both sides. The latter renders virtio-blk unusable
with DASD backing, because things simply don't work with the default.
See Fixes tags for relevant commits.
For QEMU, we can work around the issue by writing out the feature bits
with VIRTIO_F_VERSION_1 bit set. We (ab)use the finalize_features
config op for this. This isn't enough to address all vhost devices since
these do not get the features until FEATURES_OK, however it looks like
the affected devices actually never handled the endianness for legacy
mode correctly, so at least that's not a regression.
No devices except virtio net and virtio blk seem to be affected.
Long term the right thing to do is to fix the hypervisors.
Cc: <stable(a)vger.kernel.org> #v4.11
Signed-off-by: Halil Pasic <pasic(a)linux.ibm.com>
Fixes: 82e89ea077b9 ("virtio-blk: Add validation for block size in config space")
Fixes: fe36cbe0671e ("virtio_net: clear MTU when out of range")
Reported-by: markver(a)us.ibm.com
Reviewed-by: Cornelia Huck <cohuck(a)redhat.com>
---
@Connie: I made some more commit message changes to accommodate Michael's
requests. I just assumed these will work or you as well and kept your
r-b. Please shout at me if it needs to be dropped :)
---
drivers/virtio/virtio.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
index 0a5b54034d4b..236081afe9a2 100644
--- a/drivers/virtio/virtio.c
+++ b/drivers/virtio/virtio.c
@@ -239,6 +239,17 @@ static int virtio_dev_probe(struct device *_d)
driver_features_legacy = driver_features;
}
+ /*
+ * Some devices detect legacy solely via F_VERSION_1. Write
+ * F_VERSION_1 to force LE config space accesses before FEATURES_OK for
+ * these when needed.
+ */
+ if (drv->validate && !virtio_legacy_is_little_endian()
+ && device_features & BIT_ULL(VIRTIO_F_VERSION_1)) {
+ dev->features = BIT_ULL(VIRTIO_F_VERSION_1);
+ dev->config->finalize_features(dev);
+ }
+
if (device_features & (1ULL << VIRTIO_F_VERSION_1))
dev->features = driver_features & device_features;
else
base-commit: 60a9483534ed0d99090a2ee1d4bb0b8179195f51
--
2.25.1
From: Stephen Boyd <swboyd(a)chromium.org>
If a cell has 'nbits' equal to a multiple of BITS_PER_BYTE the logic
*p &= GENMASK((cell->nbits%BITS_PER_BYTE) - 1, 0);
will become undefined behavior because nbits modulo BITS_PER_BYTE is 0, and we
subtract one from that making a large number that is then shifted more than the
number of bits that fit into an unsigned long.
UBSAN reports this problem:
UBSAN: shift-out-of-bounds in drivers/nvmem/core.c:1386:8
shift exponent 64 is too large for 64-bit type 'unsigned long'
CPU: 6 PID: 7 Comm: kworker/u16:0 Not tainted 5.15.0-rc3+ #9
Hardware name: Google Lazor (rev3+) with KB Backlight (DT)
Workqueue: events_unbound deferred_probe_work_func
Call trace:
dump_backtrace+0x0/0x170
show_stack+0x24/0x30
dump_stack_lvl+0x64/0x7c
dump_stack+0x18/0x38
ubsan_epilogue+0x10/0x54
__ubsan_handle_shift_out_of_bounds+0x180/0x194
__nvmem_cell_read+0x1ec/0x21c
nvmem_cell_read+0x58/0x94
nvmem_cell_read_variable_common+0x4c/0xb0
nvmem_cell_read_variable_le_u32+0x40/0x100
a6xx_gpu_init+0x170/0x2f4
adreno_bind+0x174/0x284
component_bind_all+0xf0/0x264
msm_drm_bind+0x1d8/0x7a0
try_to_bring_up_master+0x164/0x1ac
__component_add+0xbc/0x13c
component_add+0x20/0x2c
dp_display_probe+0x340/0x384
platform_probe+0xc0/0x100
really_probe+0x110/0x304
__driver_probe_device+0xb8/0x120
driver_probe_device+0x4c/0xfc
__device_attach_driver+0xb0/0x128
bus_for_each_drv+0x90/0xdc
__device_attach+0xc8/0x174
device_initial_probe+0x20/0x2c
bus_probe_device+0x40/0xa4
deferred_probe_work_func+0x7c/0xb8
process_one_work+0x128/0x21c
process_scheduled_works+0x40/0x54
worker_thread+0x1ec/0x2a8
kthread+0x138/0x158
ret_from_fork+0x10/0x20
Fix it by making sure there are any bits to mask out.
Cc: Douglas Anderson <dianders(a)chromium.org>
Fixes: 69aba7948cbe ("nvmem: Add a simple NVMEM framework for consumers")
Cc: stable(a)vger.kernel.org
Signed-off-by: Stephen Boyd <swboyd(a)chromium.org>
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla(a)linaro.org>
---
Hi Greg,
Could you pick this up for next possible 5.15 rc.
Thanks,
--srini
drivers/nvmem/core.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/nvmem/core.c b/drivers/nvmem/core.c
index 3d87fadaa160..8976da38b375 100644
--- a/drivers/nvmem/core.c
+++ b/drivers/nvmem/core.c
@@ -1383,7 +1383,8 @@ static void nvmem_shift_read_buffer_in_place(struct nvmem_cell *cell, void *buf)
*p-- = 0;
/* clear msb bits if any leftover in the last byte */
- *p &= GENMASK((cell->nbits%BITS_PER_BYTE) - 1, 0);
+ if (cell->nbits % BITS_PER_BYTE)
+ *p &= GENMASK((cell->nbits % BITS_PER_BYTE) - 1, 0);
}
static int __nvmem_cell_read(struct nvmem_device *nvmem,
--
2.21.0
This is the start of the stable review cycle for the 4.19.211 release.
There are 27 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Thu, 14 Oct 2021 09:33:32 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.19.211-r…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.19.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.19.211-rc3
Lukas Bulwahn <lukas.bulwahn(a)gmail.com>
x86/Kconfig: Correct reference to MWINCHIP3D
Jamie Iles <quic_jiles(a)quicinc.com>
i2c: acpi: fix resource leak in reconfiguration device addition
Sylwester Dziedziuch <sylwesterx.dziedziuch(a)intel.com>
i40e: Fix freeing of uninitialized misc IRQ vector
Jiri Benc <jbenc(a)redhat.com>
i40e: fix endless loop under rtnl
Eric Dumazet <edumazet(a)google.com>
rtnetlink: fix if_nlmsg_stats_size() under estimation
Yang Yingliang <yangyingliang(a)huawei.com>
drm/nouveau/debugfs: fix file release memory leak
Eric Dumazet <edumazet(a)google.com>
netlink: annotate data races around nlk->bound
Sean Anderson <sean.anderson(a)seco.com>
net: sfp: Fix typo in state machine debug string
Eric Dumazet <edumazet(a)google.com>
net: bridge: use nla_total_size_64bit() in br_get_linkxstats_size()
Oleksij Rempel <o.rempel(a)pengutronix.de>
ARM: imx6: disable the GIC CPU interface before calling stby-poweroff sequence
Andy Shevchenko <andriy.shevchenko(a)linux.intel.com>
ptp_pch: Load module automatically if ID matches
Pali Rohár <pali(a)kernel.org>
powerpc/fsl/dts: Fix phy-connection-type for fm1mac3
Eric Dumazet <edumazet(a)google.com>
net_sched: fix NULL deref in fifo_set_limit()
Pavel Skripkin <paskripkin(a)gmail.com>
phy: mdio: fix memory leak
Tatsuhiko Yasumatsu <th.yasumatsu(a)gmail.com>
bpf: Fix integer overflow in prealloc_elems_and_freelist()
Johan Almbladh <johan.almbladh(a)anyfinetworks.com>
bpf, arm: Fix register clobbering in div/mod implementation
Max Filippov <jcmvbkbc(a)gmail.com>
xtensa: call irqchip_init only when CONFIG_USE_OF is selected
Piotr Krysiuk <piotras(a)gmail.com>
bpf, mips: Validate conditional branch offsets
David Heidelberg <david(a)ixit.cz>
ARM: dts: qcom: apq8064: use compatible which contains chipid
Roger Quadros <rogerq(a)kernel.org>
ARM: dts: omap3430-sdp: Fix NAND device node
Juergen Gross <jgross(a)suse.com>
xen/balloon: fix cancelled balloon action
Trond Myklebust <trond.myklebust(a)hammerspace.com>
nfsd4: Handle the NFSv4 READDIR 'dircount' hint being zero
Zheng Liang <zhengliang6(a)huawei.com>
ovl: fix missing negative dentry check in ovl_rename()
Jan Beulich <jbeulich(a)suse.com>
xen/privcmd: fix error handling in mmap-resource processing
Johan Hovold <johan(a)kernel.org>
USB: cdc-acm: fix break reporting
Johan Hovold <johan(a)kernel.org>
USB: cdc-acm: fix racy tty buffer accesses
Ben Hutchings <ben(a)decadent.org.uk>
Partially revert "usb: Kconfig: using select for USB_COMMON dependency"
-------------
Diffstat:
Makefile | 4 +-
arch/arm/boot/dts/omap3430-sdp.dts | 2 +-
arch/arm/boot/dts/qcom-apq8064.dtsi | 3 +-
arch/arm/mach-imx/pm-imx6.c | 2 +
arch/arm/net/bpf_jit_32.c | 19 ++++++++++
arch/mips/net/bpf_jit.c | 57 ++++++++++++++++++++++-------
arch/powerpc/boot/dts/fsl/t1023rdb.dts | 2 +-
arch/x86/Kconfig | 2 +-
arch/xtensa/kernel/irq.c | 2 +-
drivers/gpu/drm/nouveau/nouveau_debugfs.c | 1 +
drivers/i2c/i2c-core-acpi.c | 1 +
drivers/net/ethernet/intel/i40e/i40e_main.c | 5 ++-
drivers/net/phy/mdio_bus.c | 7 ++++
drivers/net/phy/sfp.c | 2 +-
drivers/ptp/ptp_pch.c | 1 +
drivers/usb/Kconfig | 3 +-
drivers/usb/class/cdc-acm.c | 8 ++++
drivers/xen/balloon.c | 21 ++++++++---
drivers/xen/privcmd.c | 7 ++--
fs/nfsd/nfs4xdr.c | 19 ++++++----
fs/overlayfs/dir.c | 10 +++--
kernel/bpf/stackmap.c | 3 +-
net/bridge/br_netlink.c | 2 +-
net/core/rtnetlink.c | 2 +-
net/netlink/af_netlink.c | 14 +++++--
net/sched/sch_fifo.c | 3 ++
26 files changed, 148 insertions(+), 54 deletions(-)
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From a46044a92add6a400f4dada7b943b30221f7cc80 Mon Sep 17 00:00:00 2001
From: Niklas Schnelle <schnelle(a)linux.ibm.com>
Date: Wed, 22 Sep 2021 15:55:12 +0200
Subject: [PATCH] s390/pci: fix zpci_zdev_put() on reserve
Since commit 2a671f77ee49 ("s390/pci: fix use after free of zpci_dev")
the reference count of a zpci_dev is incremented between
pcibios_add_device() and pcibios_release_device() which was supposed to
prevent the zpci_dev from being freed while the common PCI code has
access to it. It was missed however that the handling of zPCI
availability events assumed that once zpci_zdev_put() was called no
later availability event would still see the device. With the previously
mentioned commit however this assumption no longer holds and we must
make sure that we only drop the initial long-lived reference the zPCI
subsystem holds exactly once.
Do so by introducing a zpci_device_reserved() function that handles when
a device is reserved. Here we make sure the zpci_dev will not be
considered for further events by removing it from the zpci_list.
This also means that the device actually stays in the
ZPCI_FN_STATE_RESERVED state between the time we know it has been
reserved and the final reference going away. We thus need to consider it
a real state instead of just a conceptual state after the removal. The
final cleanup of PCI resources, removal from zbus, and destruction of
the IOMMU stays in zpci_release_device() to make sure holders of the
reference do see valid data until the release.
Fixes: 2a671f77ee49 ("s390/pci: fix use after free of zpci_dev")
Cc: stable(a)vger.kernel.org
Signed-off-by: Niklas Schnelle <schnelle(a)linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor(a)linux.ibm.com>
diff --git a/arch/s390/include/asm/pci.h b/arch/s390/include/asm/pci.h
index e4803ec51110..6b3c366af78e 100644
--- a/arch/s390/include/asm/pci.h
+++ b/arch/s390/include/asm/pci.h
@@ -207,6 +207,8 @@ int zpci_enable_device(struct zpci_dev *);
int zpci_disable_device(struct zpci_dev *);
int zpci_scan_configured_device(struct zpci_dev *zdev, u32 fh);
int zpci_deconfigure_device(struct zpci_dev *zdev);
+void zpci_device_reserved(struct zpci_dev *zdev);
+bool zpci_is_device_configured(struct zpci_dev *zdev);
int zpci_register_ioat(struct zpci_dev *, u8, u64, u64, u64);
int zpci_unregister_ioat(struct zpci_dev *, u8);
diff --git a/arch/s390/pci/pci.c b/arch/s390/pci/pci.c
index e7e6788d75a8..b833155ce838 100644
--- a/arch/s390/pci/pci.c
+++ b/arch/s390/pci/pci.c
@@ -92,7 +92,7 @@ void zpci_remove_reserved_devices(void)
spin_unlock(&zpci_list_lock);
list_for_each_entry_safe(zdev, tmp, &remove, entry)
- zpci_zdev_put(zdev);
+ zpci_device_reserved(zdev);
}
int pci_domain_nr(struct pci_bus *bus)
@@ -751,6 +751,14 @@ struct zpci_dev *zpci_create_device(u32 fid, u32 fh, enum zpci_state state)
return ERR_PTR(rc);
}
+bool zpci_is_device_configured(struct zpci_dev *zdev)
+{
+ enum zpci_state state = zdev->state;
+
+ return state != ZPCI_FN_STATE_RESERVED &&
+ state != ZPCI_FN_STATE_STANDBY;
+}
+
/**
* zpci_scan_configured_device() - Scan a freshly configured zpci_dev
* @zdev: The zpci_dev to be configured
@@ -822,6 +830,31 @@ int zpci_deconfigure_device(struct zpci_dev *zdev)
return 0;
}
+/**
+ * zpci_device_reserved() - Mark device as resverved
+ * @zdev: the zpci_dev that was reserved
+ *
+ * Handle the case that a given zPCI function was reserved by another system.
+ * After a call to this function the zpci_dev can not be found via
+ * get_zdev_by_fid() anymore but may still be accessible via existing
+ * references though it will not be functional anymore.
+ */
+void zpci_device_reserved(struct zpci_dev *zdev)
+{
+ if (zdev->has_hp_slot)
+ zpci_exit_slot(zdev);
+ /*
+ * Remove device from zpci_list as it is going away. This also
+ * makes sure we ignore subsequent zPCI events for this device.
+ */
+ spin_lock(&zpci_list_lock);
+ list_del(&zdev->entry);
+ spin_unlock(&zpci_list_lock);
+ zdev->state = ZPCI_FN_STATE_RESERVED;
+ zpci_dbg(3, "rsv fid:%x\n", zdev->fid);
+ zpci_zdev_put(zdev);
+}
+
void zpci_release_device(struct kref *kref)
{
struct zpci_dev *zdev = container_of(kref, struct zpci_dev, kref);
@@ -843,6 +876,12 @@ void zpci_release_device(struct kref *kref)
case ZPCI_FN_STATE_STANDBY:
if (zdev->has_hp_slot)
zpci_exit_slot(zdev);
+ spin_lock(&zpci_list_lock);
+ list_del(&zdev->entry);
+ spin_unlock(&zpci_list_lock);
+ zpci_dbg(3, "rsv fid:%x\n", zdev->fid);
+ fallthrough;
+ case ZPCI_FN_STATE_RESERVED:
if (zdev->has_resources)
zpci_cleanup_bus_resources(zdev);
zpci_bus_device_unregister(zdev);
@@ -851,10 +890,6 @@ void zpci_release_device(struct kref *kref)
default:
break;
}
-
- spin_lock(&zpci_list_lock);
- list_del(&zdev->entry);
- spin_unlock(&zpci_list_lock);
zpci_dbg(3, "rem fid:%x\n", zdev->fid);
kfree(zdev);
}
diff --git a/arch/s390/pci/pci_event.c b/arch/s390/pci/pci_event.c
index c856f80cb21b..5b8d647523f9 100644
--- a/arch/s390/pci/pci_event.c
+++ b/arch/s390/pci/pci_event.c
@@ -140,7 +140,7 @@ static void __zpci_event_availability(struct zpci_ccdf_avail *ccdf)
/* The 0x0304 event may immediately reserve the device */
if (!clp_get_state(zdev->fid, &state) &&
state == ZPCI_FN_STATE_RESERVED) {
- zpci_zdev_put(zdev);
+ zpci_device_reserved(zdev);
}
}
break;
@@ -151,7 +151,7 @@ static void __zpci_event_availability(struct zpci_ccdf_avail *ccdf)
case 0x0308: /* Standby -> Reserved */
if (!zdev)
break;
- zpci_zdev_put(zdev);
+ zpci_device_reserved(zdev);
break;
default:
break;
diff --git a/drivers/pci/hotplug/s390_pci_hpc.c b/drivers/pci/hotplug/s390_pci_hpc.c
index 014868752cd4..dcefdb42ac46 100644
--- a/drivers/pci/hotplug/s390_pci_hpc.c
+++ b/drivers/pci/hotplug/s390_pci_hpc.c
@@ -62,14 +62,7 @@ static int get_power_status(struct hotplug_slot *hotplug_slot, u8 *value)
struct zpci_dev *zdev = container_of(hotplug_slot, struct zpci_dev,
hotplug_slot);
- switch (zdev->state) {
- case ZPCI_FN_STATE_STANDBY:
- *value = 0;
- break;
- default:
- *value = 1;
- break;
- }
+ *value = zpci_is_device_configured(zdev) ? 1 : 0;
return 0;
}
If two processes mount same superblock, memory leak occurs:
CPU0 | CPU1
do_new_mount | do_new_mount
fs_set_subtype | fs_set_subtype
kstrdup |
| kstrdup
memrory leak |
Fix this by adding a write lock while calling fs_set_subtype.
Linus's tree already have refactoring patchset [1], one of them can fix this bug:
c30da2e981a7 (fuse: convert to use the new mount API)
Since we did not merge the refactoring patchset in this branch, I create this patch.
[1] https://patchwork.kernel.org/project/linux-fsdevel/patch/20190903113640.798…
Fixes: 79c0b2df79eb (add filesystem subtype support)
Cc: David Howells <dhowells(a)redhat.com>
Signed-off-by: ChenXiaoSong <chenxiaosong2(a)huawei.com>
---
fs/namespace.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/fs/namespace.c b/fs/namespace.c
index 2f3c6a0350a8..396ff1bcfdad 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2490,9 +2490,12 @@ static int do_new_mount(struct path *path, const char *fstype, int sb_flags,
return -ENODEV;
mnt = vfs_kern_mount(type, sb_flags, name, data);
- if (!IS_ERR(mnt) && (type->fs_flags & FS_HAS_SUBTYPE) &&
- !mnt->mnt_sb->s_subtype)
- mnt = fs_set_subtype(mnt, fstype);
+ if (!IS_ERR(mnt) && (type->fs_flags & FS_HAS_SUBTYPE)) {
+ down_write(&mnt->mnt_sb->s_umount);
+ if (!mnt->mnt_sb->s_subtype)
+ mnt = fs_set_subtype(mnt, fstype);
+ up_write(&mnt->mnt_sb->s_umount);
+ }
put_filesystem(type);
if (IS_ERR(mnt))
--
2.25.4