From: Dmitry Mastykin mastichi@gmail.com
[ Upstream commit aad11473f8f4be3df86461081ce35ec5b145ba68 ]
We leak nfs_fattr and nfs4_label every time we set a security xattr.
Signed-off-by: Dmitry Mastykin mastichi@gmail.com Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfs/nfs4proc.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c index f0953200acd08..05490d4784f1a 100644 --- a/fs/nfs/nfs4proc.c +++ b/fs/nfs/nfs4proc.c @@ -6268,6 +6268,7 @@ nfs4_set_security_label(struct inode *inode, const void *buf, size_t buflen) if (status == 0) nfs_setsecurity(inode, fattr);
+ nfs_free_fattr(fattr); return status; } #endif /* CONFIG_NFS_V4_SECURITY_LABEL */
From: Sagi Grimberg sagi@grimberg.me
[ Upstream commit 134d0b3f2440cdddd12fc3444c9c0f62331ce6fc ]
There is an inherent race where a symlink file may have been overriden (by a different client) between lookup and readlink, resulting in a spurious EIO error returned to userspace. Fix this by propagating back ESTALE errors such that the vfs will retry the lookup/get_link (similar to nfs4_file_open) at least once.
Cc: Dan Aloni dan.aloni@vastdata.com Signed-off-by: Sagi Grimberg sagi@grimberg.me Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfs/symlink.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/nfs/symlink.c b/fs/nfs/symlink.c index 0e27a2e4e68b8..13818129d268f 100644 --- a/fs/nfs/symlink.c +++ b/fs/nfs/symlink.c @@ -41,7 +41,7 @@ static int nfs_symlink_filler(struct file *file, struct folio *folio) error: folio_set_error(folio); folio_unlock(folio); - return -EIO; + return error; }
static const char *nfs_get_link(struct dentry *dentry,
From: Jan Kara jack@suse.cz
[ Upstream commit a527c3ba41c4c61e2069bfce4091e5515f06a8dd ]
When we are doing WB_SYNC_ALL writeback, nfs submits write requests with NFS_FILE_SYNC flag to the server (which then generally treats it as an O_SYNC write). This helps to reduce latency for single requests but when submitting more requests, additional fsyncs on the server side hurt latency. NFS generally avoids this additional overhead by not setting NFS_FILE_SYNC if desc->pg_moreio is set.
However this logic doesn't always work. When we do random 4k writes to a huge file and then call fsync(2), each page writeback is going to be sent with NFS_FILE_SYNC because after preparing one page for writeback, we start writing back next, nfs_do_writepage() will call nfs_pageio_cond_complete() which finds the page is not contiguous with previously prepared IO and submits is *without* setting desc->pg_moreio. Hence NFS_FILE_SYNC is used resulting in poor performance.
Fix the problem by setting desc->pg_moreio in nfs_pageio_cond_complete() before submitting outstanding IO. This improves throughput of fsync-after-random-writes on my test SSD from ~70MB/s to ~250MB/s.
Signed-off-by: Jan Kara jack@suse.cz Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfs/pagelist.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c index 6efb5068c116e..040b6b79c75e5 100644 --- a/fs/nfs/pagelist.c +++ b/fs/nfs/pagelist.c @@ -1545,6 +1545,11 @@ void nfs_pageio_cond_complete(struct nfs_pageio_descriptor *desc, pgoff_t index) continue; } else if (index == prev->wb_index + 1) continue; + /* + * We will submit more requests after these. Indicate + * this to the underlying layers. + */ + desc->pg_moreio = 1; nfs_pageio_complete(desc); break; }
From: Scott Mayhew smayhew@redhat.com
[ Upstream commit 0c8c7c559740d2d8b66048162af6c4dba8f0c88c ]
This is a slight variation on a patch previously proposed by Neil Brown that never got merged.
Prior to commit 5ceb9d7fdaaf ("NFS: Refactor nfs_lookup_revalidate()"), any error from nfs_lookup_verify_inode() other than -ESTALE would result in nfs_lookup_revalidate() returning that error (-ESTALE is mapped to zero).
Since that commit, all errors result in nfs_lookup_revalidate() returning zero, resulting in dentries being invalidated where they previously were not (particularly in the case of -ERESTARTSYS).
Fix it by passing the actual error code to nfs_lookup_revalidate_done(), and leaving the decision on whether to map the error code to zero or one to nfs_lookup_revalidate_done().
A simple reproducer is to run the following python code in a subdirectory of an NFS mount (not in the root of the NFS mount):
---8<--- import os import multiprocessing import time
if __name__=="__main__": multiprocessing.set_start_method("spawn")
count = 0 while True: try: os.getcwd() pool = multiprocessing.Pool(10) pool.close() pool.terminate() count += 1 except Exception as e: print(f"Failed after {count} iterations") print(e) break ---8<---
Prior to commit 5ceb9d7fdaaf, the above code would run indefinitely. After commit 5ceb9d7fdaaf, it fails almost immediately with -ENOENT.
Signed-off-by: Scott Mayhew smayhew@redhat.com Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfs/dir.c | 27 +++++++++++++-------------- 1 file changed, 13 insertions(+), 14 deletions(-)
diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c index 2a0f069d5a096..39f7549afcf5b 100644 --- a/fs/nfs/dir.c +++ b/fs/nfs/dir.c @@ -1625,7 +1625,16 @@ nfs_lookup_revalidate_done(struct inode *dir, struct dentry *dentry, switch (error) { case 1: break; - case 0: + case -ETIMEDOUT: + if (inode && (IS_ROOT(dentry) || + NFS_SERVER(inode)->flags & NFS_MOUNT_SOFTREVAL)) + error = 1; + break; + case -ESTALE: + case -ENOENT: + error = 0; + fallthrough; + default: /* * We can't d_drop the root of a disconnected tree: * its d_hash is on the s_anon list and d_drop() would hide @@ -1680,18 +1689,8 @@ static int nfs_lookup_revalidate_dentry(struct inode *dir,
dir_verifier = nfs_save_change_attribute(dir); ret = NFS_PROTO(dir)->lookup(dir, dentry, fhandle, fattr); - if (ret < 0) { - switch (ret) { - case -ESTALE: - case -ENOENT: - ret = 0; - break; - case -ETIMEDOUT: - if (NFS_SERVER(inode)->flags & NFS_MOUNT_SOFTREVAL) - ret = 1; - } + if (ret < 0) goto out; - }
/* Request help from readdirplus */ nfs_lookup_advise_force_readdirplus(dir, flags); @@ -1735,7 +1734,7 @@ nfs_do_lookup_revalidate(struct inode *dir, struct dentry *dentry, unsigned int flags) { struct inode *inode; - int error; + int error = 0;
nfs_inc_stats(dir, NFSIOS_DENTRYREVALIDATE); inode = d_inode(dentry); @@ -1780,7 +1779,7 @@ nfs_do_lookup_revalidate(struct inode *dir, struct dentry *dentry, out_bad: if (flags & LOOKUP_RCU) return -ECHILD; - return nfs_lookup_revalidate_done(dir, dentry, inode, 0); + return nfs_lookup_revalidate_done(dir, dentry, inode, error); }
static int
From: Baokun Li libaokun1@huawei.com
[ Upstream commit a26dc49df37e996876f50a0210039b2d211fdd6f ]
This prevents malicious processes from completing random copen/cread requests and crashing the system. Added checks are listed below:
* Generic, copen can only complete open requests, and cread can only complete read requests. * For copen, ondemand_id must not be 0, because this indicates that the request has not been read by the daemon. * For cread, the object corresponding to fd and req should be the same.
Signed-off-by: Baokun Li libaokun1@huawei.com Link: https://lore.kernel.org/r/20240522114308.2402121-7-libaokun@huaweicloud.com Acked-by: Jeff Layton jlayton@kernel.org Reviewed-by: Jingbo Xu jefflexu@linux.alibaba.com Signed-off-by: Christian Brauner brauner@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- fs/cachefiles/ondemand.c | 27 ++++++++++++++++++++------- 1 file changed, 20 insertions(+), 7 deletions(-)
diff --git a/fs/cachefiles/ondemand.c b/fs/cachefiles/ondemand.c index 4b39f0422e590..b8e9cdc93643e 100644 --- a/fs/cachefiles/ondemand.c +++ b/fs/cachefiles/ondemand.c @@ -97,12 +97,12 @@ static loff_t cachefiles_ondemand_fd_llseek(struct file *filp, loff_t pos, }
static long cachefiles_ondemand_fd_ioctl(struct file *filp, unsigned int ioctl, - unsigned long arg) + unsigned long id) { struct cachefiles_object *object = filp->private_data; struct cachefiles_cache *cache = object->volume->cache; struct cachefiles_req *req; - unsigned long id; + XA_STATE(xas, &cache->reqs, id);
if (ioctl != CACHEFILES_IOC_READ_COMPLETE) return -EINVAL; @@ -110,10 +110,15 @@ static long cachefiles_ondemand_fd_ioctl(struct file *filp, unsigned int ioctl, if (!test_bit(CACHEFILES_ONDEMAND_MODE, &cache->flags)) return -EOPNOTSUPP;
- id = arg; - req = xa_erase(&cache->reqs, id); - if (!req) + xa_lock(&cache->reqs); + req = xas_load(&xas); + if (!req || req->msg.opcode != CACHEFILES_OP_READ || + req->object != object) { + xa_unlock(&cache->reqs); return -EINVAL; + } + xas_store(&xas, NULL); + xa_unlock(&cache->reqs);
trace_cachefiles_ondemand_cread(object, id); complete(&req->done); @@ -142,6 +147,7 @@ int cachefiles_ondemand_copen(struct cachefiles_cache *cache, char *args) unsigned long id; long size; int ret; + XA_STATE(xas, &cache->reqs, 0);
if (!test_bit(CACHEFILES_ONDEMAND_MODE, &cache->flags)) return -EOPNOTSUPP; @@ -165,9 +171,16 @@ int cachefiles_ondemand_copen(struct cachefiles_cache *cache, char *args) if (ret) return ret;
- req = xa_erase(&cache->reqs, id); - if (!req) + xa_lock(&cache->reqs); + xas.xa_index = id; + req = xas_load(&xas); + if (!req || req->msg.opcode != CACHEFILES_OP_OPEN || + !req->object->ondemand->ondemand_id) { + xa_unlock(&cache->reqs); return -EINVAL; + } + xas_store(&xas, NULL); + xa_unlock(&cache->reqs);
/* fail OPEN request if copen format is invalid */ ret = kstrtol(psize, 0, &size);
From: Zizhi Wo wozizhi@huawei.com
[ Upstream commit 4f8703fb3482f92edcfd31661857b16fec89c2c0 ]
If copen is maliciously called in the user mode, it may delete the request corresponding to the random id. And the request may have not been read yet.
Note that when the object is set to reopen, the open request will be done with the still reopen state in above case. As a result, the request corresponding to this object is always skipped in select_req function, so the read request is never completed and blocks other process.
Fix this issue by simply set object to close if its id < 0 in copen.
Signed-off-by: Zizhi Wo wozizhi@huawei.com Signed-off-by: Baokun Li libaokun1@huawei.com Link: https://lore.kernel.org/r/20240522114308.2402121-11-libaokun@huaweicloud.com Acked-by: Jeff Layton jlayton@kernel.org Reviewed-by: Jia Zhu zhujia.zj@bytedance.com Signed-off-by: Christian Brauner brauner@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- fs/cachefiles/ondemand.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/fs/cachefiles/ondemand.c b/fs/cachefiles/ondemand.c index b8e9cdc93643e..0862d69d64759 100644 --- a/fs/cachefiles/ondemand.c +++ b/fs/cachefiles/ondemand.c @@ -182,6 +182,7 @@ int cachefiles_ondemand_copen(struct cachefiles_cache *cache, char *args) xas_store(&xas, NULL); xa_unlock(&cache->reqs);
+ info = req->object->ondemand; /* fail OPEN request if copen format is invalid */ ret = kstrtol(psize, 0, &size); if (ret) { @@ -201,7 +202,6 @@ int cachefiles_ondemand_copen(struct cachefiles_cache *cache, char *args) goto out; }
- info = req->object->ondemand; spin_lock(&info->lock); /* * The anonymous fd was closed before copen ? Fail the request. @@ -241,6 +241,11 @@ int cachefiles_ondemand_copen(struct cachefiles_cache *cache, char *args) wake_up_all(&cache->daemon_pollwq);
out: + spin_lock(&info->lock); + /* Need to set object close to avoid reopen status continuing */ + if (info->ondemand_id == CACHEFILES_ONDEMAND_ID_CLOSED) + cachefiles_ondemand_set_object_close(req->object); + spin_unlock(&info->lock); complete(&req->done); return ret; }
From: Baokun Li libaokun1@huawei.com
[ Upstream commit bc9dde6155464e906e630a0a5c17a4cab241ffbb ]
Replacing wait_for_completion() with wait_for_completion_killable() in cachefiles_ondemand_send_req() allows us to kill processes that might trigger a hunk_task if the daemon is abnormal.
But now only CACHEFILES_OP_READ is killable, because OP_CLOSE and OP_OPEN is initiated from kworker context and the signal is prohibited in these kworker.
Note that when the req in xas changes, i.e. xas_load(&xas) != req, it means that a process will complete the current request soon, so wait again for the request to be completed.
In addition, add the cachefiles_ondemand_finish_req() helper function to simplify the code.
Suggested-by: Hou Tao houtao1@huawei.com Signed-off-by: Baokun Li libaokun1@huawei.com Link: https://lore.kernel.org/r/20240522114308.2402121-13-libaokun@huaweicloud.com Acked-by: Jeff Layton jlayton@kernel.org Reviewed-by: Jia Zhu zhujia.zj@bytedance.com Signed-off-by: Christian Brauner brauner@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- fs/cachefiles/ondemand.c | 40 ++++++++++++++++++++++++++++------------ 1 file changed, 28 insertions(+), 12 deletions(-)
diff --git a/fs/cachefiles/ondemand.c b/fs/cachefiles/ondemand.c index 0862d69d64759..9513efaeb7ab6 100644 --- a/fs/cachefiles/ondemand.c +++ b/fs/cachefiles/ondemand.c @@ -380,6 +380,20 @@ static struct cachefiles_req *cachefiles_ondemand_select_req(struct xa_state *xa return NULL; }
+static inline bool cachefiles_ondemand_finish_req(struct cachefiles_req *req, + struct xa_state *xas, int err) +{ + if (unlikely(!xas || !req)) + return false; + + if (xa_cmpxchg(xas->xa, xas->xa_index, req, NULL, 0) != req) + return false; + + req->error = err; + complete(&req->done); + return true; +} + ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache, char __user *_buffer, size_t buflen) { @@ -443,16 +457,8 @@ ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache, out: cachefiles_put_object(req->object, cachefiles_obj_put_read_req); /* Remove error request and CLOSE request has no reply */ - if (ret || msg->opcode == CACHEFILES_OP_CLOSE) { - xas_reset(&xas); - xas_lock(&xas); - if (xas_load(&xas) == req) { - req->error = ret; - complete(&req->done); - xas_store(&xas, NULL); - } - xas_unlock(&xas); - } + if (ret || msg->opcode == CACHEFILES_OP_CLOSE) + cachefiles_ondemand_finish_req(req, &xas, ret); cachefiles_req_put(req); return ret ? ret : n; } @@ -544,8 +550,18 @@ static int cachefiles_ondemand_send_req(struct cachefiles_object *object, goto out;
wake_up_all(&cache->daemon_pollwq); - wait_for_completion(&req->done); - ret = req->error; +wait: + ret = wait_for_completion_killable(&req->done); + if (!ret) { + ret = req->error; + } else { + ret = -EINTR; + if (!cachefiles_ondemand_finish_req(req, &xas, ret)) { + /* Someone will complete it soon. */ + cpu_relax(); + goto wait; + } + } cachefiles_req_put(req); return ret; out:
From: Yuntao Wang yuntao.wang@linux.dev
[ Upstream commit ed8c7fbdfe117abbef81f65428ba263118ef298a ]
The maximum possible return value of find_next_zero_bit(fdt->full_fds_bits, maxbit, bitbit) is maxbit. This return value, multiplied by BITS_PER_LONG, gives the value of bitbit, which can never be greater than maxfd, it can only be equal to maxfd at most, so the following check 'if (bitbit > maxfd)' will never be true.
Moreover, when bitbit equals maxfd, it indicates that there are no unused fds, and the function can directly return.
Fix this check.
Signed-off-by: Yuntao Wang yuntao.wang@linux.dev Link: https://lore.kernel.org/r/20240529160656.209352-1-yuntao.wang@linux.dev Reviewed-by: Jan Kara jack@suse.cz Signed-off-by: Christian Brauner brauner@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- fs/file.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/file.c b/fs/file.c index 3e4a4dfa38fca..a815f6eddc511 100644 --- a/fs/file.c +++ b/fs/file.c @@ -481,12 +481,12 @@ struct files_struct init_files = {
static unsigned int find_next_fd(struct fdtable *fdt, unsigned int start) { - unsigned int maxfd = fdt->max_fds; + unsigned int maxfd = fdt->max_fds; /* always multiple of BITS_PER_LONG */ unsigned int maxbit = maxfd / BITS_PER_LONG; unsigned int bitbit = start / BITS_PER_LONG;
bitbit = find_next_zero_bit(fdt->full_fds_bits, maxbit, bitbit) * BITS_PER_LONG; - if (bitbit > maxfd) + if (bitbit >= maxfd) return maxfd; if (bitbit > start) start = bitbit;
From: Alex Williamson alex.williamson@redhat.com
[ Upstream commit b7c5e64fecfa88764791679cca4786ac65de739e ]
By linking all the device fds we provide to userspace to an address space through a new pseudo fs, we can use tools like unmap_mapping_range() to zap all vmas associated with a device.
Suggested-by: Jason Gunthorpe jgg@nvidia.com Reviewed-by: Jason Gunthorpe jgg@nvidia.com Reviewed-by: Kevin Tian kevin.tian@intel.com Link: https://lore.kernel.org/r/20240530045236.1005864-2-alex.williamson@redhat.co... Signed-off-by: Alex Williamson alex.williamson@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/vfio/device_cdev.c | 7 ++++++ drivers/vfio/group.c | 7 ++++++ drivers/vfio/vfio_main.c | 44 ++++++++++++++++++++++++++++++++++++++ include/linux/vfio.h | 1 + 4 files changed, 59 insertions(+)
diff --git a/drivers/vfio/device_cdev.c b/drivers/vfio/device_cdev.c index e75da0a70d1f8..bb1817bd4ff31 100644 --- a/drivers/vfio/device_cdev.c +++ b/drivers/vfio/device_cdev.c @@ -39,6 +39,13 @@ int vfio_device_fops_cdev_open(struct inode *inode, struct file *filep)
filep->private_data = df;
+ /* + * Use the pseudo fs inode on the device to link all mmaps + * to the same address space, allowing us to unmap all vmas + * associated to this device using unmap_mapping_range(). + */ + filep->f_mapping = device->inode->i_mapping; + return 0;
err_put_registration: diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c index 610a429c61912..ded364588d297 100644 --- a/drivers/vfio/group.c +++ b/drivers/vfio/group.c @@ -286,6 +286,13 @@ static struct file *vfio_device_open_file(struct vfio_device *device) */ filep->f_mode |= (FMODE_PREAD | FMODE_PWRITE);
+ /* + * Use the pseudo fs inode on the device to link all mmaps + * to the same address space, allowing us to unmap all vmas + * associated to this device using unmap_mapping_range(). + */ + filep->f_mapping = device->inode->i_mapping; + if (device->group->type == VFIO_NO_IOMMU) dev_warn(device->dev, "vfio-noiommu device opened by user " "(%s:%d)\n", current->comm, task_pid_nr(current)); diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c index 40732e8ed4c6f..a205d3a4e379e 100644 --- a/drivers/vfio/vfio_main.c +++ b/drivers/vfio/vfio_main.c @@ -22,8 +22,10 @@ #include <linux/list.h> #include <linux/miscdevice.h> #include <linux/module.h> +#include <linux/mount.h> #include <linux/mutex.h> #include <linux/pci.h> +#include <linux/pseudo_fs.h> #include <linux/rwsem.h> #include <linux/sched.h> #include <linux/slab.h> @@ -43,9 +45,13 @@ #define DRIVER_AUTHOR "Alex Williamson alex.williamson@redhat.com" #define DRIVER_DESC "VFIO - User Level meta-driver"
+#define VFIO_MAGIC 0x5646494f /* "VFIO" */ + static struct vfio { struct class *device_class; struct ida device_ida; + struct vfsmount *vfs_mount; + int fs_count; } vfio;
#ifdef CONFIG_VFIO_NOIOMMU @@ -186,6 +192,8 @@ static void vfio_device_release(struct device *dev) if (device->ops->release) device->ops->release(device);
+ iput(device->inode); + simple_release_fs(&vfio.vfs_mount, &vfio.fs_count); kvfree(device); }
@@ -228,6 +236,34 @@ struct vfio_device *_vfio_alloc_device(size_t size, struct device *dev, } EXPORT_SYMBOL_GPL(_vfio_alloc_device);
+static int vfio_fs_init_fs_context(struct fs_context *fc) +{ + return init_pseudo(fc, VFIO_MAGIC) ? 0 : -ENOMEM; +} + +static struct file_system_type vfio_fs_type = { + .name = "vfio", + .owner = THIS_MODULE, + .init_fs_context = vfio_fs_init_fs_context, + .kill_sb = kill_anon_super, +}; + +static struct inode *vfio_fs_inode_new(void) +{ + struct inode *inode; + int ret; + + ret = simple_pin_fs(&vfio_fs_type, &vfio.vfs_mount, &vfio.fs_count); + if (ret) + return ERR_PTR(ret); + + inode = alloc_anon_inode(vfio.vfs_mount->mnt_sb); + if (IS_ERR(inode)) + simple_release_fs(&vfio.vfs_mount, &vfio.fs_count); + + return inode; +} + /* * Initialize a vfio_device so it can be registered to vfio core. */ @@ -246,6 +282,11 @@ static int vfio_init_device(struct vfio_device *device, struct device *dev, init_completion(&device->comp); device->dev = dev; device->ops = ops; + device->inode = vfio_fs_inode_new(); + if (IS_ERR(device->inode)) { + ret = PTR_ERR(device->inode); + goto out_inode; + }
if (ops->init) { ret = ops->init(device); @@ -260,6 +301,9 @@ static int vfio_init_device(struct vfio_device *device, struct device *dev, return 0;
out_uninit: + iput(device->inode); + simple_release_fs(&vfio.vfs_mount, &vfio.fs_count); +out_inode: vfio_release_device_set(device); ida_free(&vfio.device_ida, device->index); return ret; diff --git a/include/linux/vfio.h b/include/linux/vfio.h index 5ac5f182ce0bb..514a7f9b3ef4b 100644 --- a/include/linux/vfio.h +++ b/include/linux/vfio.h @@ -64,6 +64,7 @@ struct vfio_device { struct completion comp; struct iommufd_access *iommufd_access; void (*put_kvm)(struct kvm *kvm); + struct inode *inode; #if IS_ENABLED(CONFIG_IOMMUFD) struct iommufd_device *iommufd_device; u8 iommufd_attached:1;
From: Alexander Usyskin alexander.usyskin@intel.com
[ Upstream commit 1db5322b7e6b58e1b304ce69a50e9dca798ca95b ]
Change level for the "not connected" client message in the write callback from error to debug.
The MEI driver currently disconnects all clients upon system suspend. This behavior is by design and user-space applications with open connections before the suspend are expected to handle errors upon resume, by reopening their handles, reconnecting, and retrying their operations.
However, the current driver implementation logs an error message every time a write operation is attempted on a disconnected client. Since this is a normal and expected flow after system resume logging this as an error can be misleading.
Signed-off-by: Alexander Usyskin alexander.usyskin@intel.com Signed-off-by: Tomas Winkler tomas.winkler@intel.com Link: https://lore.kernel.org/r/20240530091415.725247-1-tomas.winkler@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/misc/mei/main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/misc/mei/main.c b/drivers/misc/mei/main.c index bb4e9eabda978..c018534c780f9 100644 --- a/drivers/misc/mei/main.c +++ b/drivers/misc/mei/main.c @@ -329,7 +329,7 @@ static ssize_t mei_write(struct file *file, const char __user *ubuf, }
if (!mei_cl_is_connected(cl)) { - cl_err(dev, cl, "is not connected"); + cl_dbg(dev, cl, "is not connected"); rets = -ENODEV; goto out; }
From: "Ritesh Harjani (IBM)" ritesh.list@gmail.com
[ Upstream commit f5ceb1bbc98c69536d4673a97315e8427e67de1b ]
If the extent spans the block that contains i_size, we need to handle both halves separately so that we properly zero data in the page cache for blocks that are entirely outside of i_size. But this is needed only when i_size is within the current folio under processing. "orig_pos + length > isize" can be true for all folios if the mapped extent length is greater than the folio size. That is making plen to break for every folio instead of only the last folio.
So use orig_plen for checking if "orig_pos + orig_plen > isize".
Signed-off-by: Ritesh Harjani (IBM) ritesh.list@gmail.com Link: https://lore.kernel.org/r/a32e5f9a4fcfdb99077300c4020ed7ae61d6e0f9.171506705... Reviewed-by: Christoph Hellwig hch@lst.de Reviewed-by: Darrick J. Wong djwong@kernel.org Reviewed-by: Jan Kara jack@suse.cz cc: Ojaswin Mujoo ojaswin@linux.ibm.com Signed-off-by: Christian Brauner brauner@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- fs/iomap/buffered-io.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index 129a85633797a..975fd88c1f0f4 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -201,6 +201,7 @@ static void iomap_adjust_read_range(struct inode *inode, struct folio *folio, unsigned block_size = (1 << block_bits); size_t poff = offset_in_folio(folio, *pos); size_t plen = min_t(loff_t, folio_size(folio) - poff, length); + size_t orig_plen = plen; unsigned first = poff >> block_bits; unsigned last = (poff + plen - 1) >> block_bits;
@@ -237,7 +238,7 @@ static void iomap_adjust_read_range(struct inode *inode, struct folio *folio, * handle both halves separately so that we properly zero data in the * page cache for blocks that are entirely outside of i_size. */ - if (orig_pos <= isize && orig_pos + length > isize) { + if (orig_pos <= isize && orig_pos + orig_plen > isize) { unsigned end = offset_in_folio(folio, isize - 1) >> block_bits;
if (first <= end && last > end)
From: Krzysztof Kozlowski krzysztof.kozlowski@linaro.org
[ Upstream commit 1f3512cdf8299f9edaea9046d53ea324a7730bab ]
Core in platform_driver_register() already sets the .owner, so driver does not need to. Whatever is set here will be anyway overwritten by main driver calling platform_driver_register().
Signed-off-by: Krzysztof Kozlowski krzysztof.kozlowski@linaro.org Signed-off-by: Inki Dae inki.dae@samsung.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/exynos/exynos_dp.c | 1 - 1 file changed, 1 deletion(-)
diff --git a/drivers/gpu/drm/exynos/exynos_dp.c b/drivers/gpu/drm/exynos/exynos_dp.c index 3404ec1367fb9..71ee824c4140b 100644 --- a/drivers/gpu/drm/exynos/exynos_dp.c +++ b/drivers/gpu/drm/exynos/exynos_dp.c @@ -288,7 +288,6 @@ struct platform_driver dp_driver = { .remove = exynos_dp_remove, .driver = { .name = "exynos-dp", - .owner = THIS_MODULE, .pm = pm_ptr(&exynos_dp_pm_ops), .of_match_table = exynos_dp_match, },
From: Tobias Jakobi tjakobi@math.uni-bielefeld.de
[ Upstream commit f74fb5df429ebc6a614dc5aa9e44d7194d402e5a ]
Similar to the other Aya Neo devices this one features again a portrait screen, here with a native resolution of 1600x2560.
Signed-off-by: Tobias Jakobi tjakobi@math.uni-bielefeld.de Reviewed-by: Hans de Goede hdegoede@redhat.com Signed-off-by: Hans de Goede hdegoede@redhat.com Link: https://patchwork.freedesktop.org/patch/msgid/20240310220401.895591-1-tjakob... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/drm_panel_orientation_quirks.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/drivers/gpu/drm/drm_panel_orientation_quirks.c b/drivers/gpu/drm/drm_panel_orientation_quirks.c index aa93129c3397e..2166208a961d6 100644 --- a/drivers/gpu/drm/drm_panel_orientation_quirks.c +++ b/drivers/gpu/drm/drm_panel_orientation_quirks.c @@ -202,6 +202,12 @@ static const struct dmi_system_id orientation_data[] = { DMI_MATCH(DMI_BOARD_NAME, "NEXT"), }, .driver_data = (void *)&lcd800x1280_rightside_up, + }, { /* AYA NEO KUN */ + .matches = { + DMI_EXACT_MATCH(DMI_BOARD_VENDOR, "AYANEO"), + DMI_MATCH(DMI_BOARD_NAME, "KUN"), + }, + .driver_data = (void *)&lcd1600x2560_rightside_up, }, { /* Chuwi HiBook (CWI514) */ .matches = { DMI_MATCH(DMI_BOARD_VENDOR, "Hampoo"),
From: Douglas Anderson dianders@chromium.org
[ Upstream commit c38896ca6318c2df20bbe6c8e3f633e071fda910 ]
Based on grepping through the source code this driver appears to be missing a call to drm_atomic_helper_shutdown() at system shutdown time. Among other things, this means that if a panel is in use that it won't be cleanly powered off at system shutdown time.
The fact that we should call drm_atomic_helper_shutdown() in the case of OS shutdown/restart comes straight out of the kernel doc "driver instance overview" in drm_drv.c.
This driver users the component model and shutdown happens in the base driver. The "drvdata" for this driver will always be valid if shutdown() is called and as of commit 2a073968289d ("drm/atomic-helper: drm_atomic_helper_shutdown(NULL) should be a noop") we don't need to confirm that "drm" is non-NULL.
Suggested-by: Maxime Ripard mripard@kernel.org Reviewed-by: Maxime Ripard mripard@kernel.org Reviewed-by: Fei Shao fshao@chromium.org Tested-by: Fei Shao fshao@chromium.org Signed-off-by: Douglas Anderson dianders@chromium.org Signed-off-by: Maxime Ripard mripard@kernel.org Link: https://patchwork.freedesktop.org/patch/msgid/20240611102744.v2.1.I2b014f90a... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/mediatek/mtk_drm_drv.c | 8 ++++++++ 1 file changed, 8 insertions(+)
diff --git a/drivers/gpu/drm/mediatek/mtk_drm_drv.c b/drivers/gpu/drm/mediatek/mtk_drm_drv.c index cdd506c803733..37d8113ba92f0 100644 --- a/drivers/gpu/drm/mediatek/mtk_drm_drv.c +++ b/drivers/gpu/drm/mediatek/mtk_drm_drv.c @@ -926,6 +926,13 @@ static void mtk_drm_remove(struct platform_device *pdev) of_node_put(private->comp_node[i]); }
+static void mtk_drm_shutdown(struct platform_device *pdev) +{ + struct mtk_drm_private *private = platform_get_drvdata(pdev); + + drm_atomic_helper_shutdown(private->drm); +} + static int mtk_drm_sys_prepare(struct device *dev) { struct mtk_drm_private *private = dev_get_drvdata(dev); @@ -957,6 +964,7 @@ static const struct dev_pm_ops mtk_drm_pm_ops = { static struct platform_driver mtk_drm_platform_driver = { .probe = mtk_drm_probe, .remove_new = mtk_drm_remove, + .shutdown = mtk_drm_shutdown, .driver = { .name = "mediatek-drm", .pm = &mtk_drm_pm_ops,
From: Chunguang Xu chunguang.xu@shopee.com
[ Upstream commit e5d574ab37f5f2e7937405613d9b1a724811e5ad ]
If a discard request needs to be retried, and that retry may fail before a new special payload is added, a double free will result. Clear the RQF_SPECIAL_LOAD when the request is cleaned.
Signed-off-by: Chunguang Xu chunguang.xu@shopee.com Reviewed-by: Sagi Grimberg sagi@grimberg.me Reviewed-by: Max Gurtovoy mgurtovoy@nvidia.com Signed-off-by: Keith Busch kbusch@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/nvme/host/core.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index 94a0916f9cb78..e969da0a681b4 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -938,6 +938,7 @@ void nvme_cleanup_cmd(struct request *req) clear_bit_unlock(0, &ctrl->discard_page_busy); else kfree(bvec_virt(&req->special_vec)); + req->rq_flags &= ~RQF_SPECIAL_PAYLOAD; } } EXPORT_SYMBOL_GPL(nvme_cleanup_cmd);
From: Daniel Wagner dwagner@suse.de
[ Upstream commit cd0c1b8e045a8d2785342b385cb2684d9b48e426 ]
The spec doesn't mandate that the first two double words (aka results) for the command queue entry need to be set to 0 when they are not used (not specified). Though, the target implemention returns 0 for TCP and FC but not for RDMA.
Let's make RDMA behave the same and thus explicitly initializing the result field. This prevents leaking any data from the stack.
Signed-off-by: Daniel Wagner dwagner@suse.de Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Keith Busch kbusch@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/nvme/target/core.c | 1 + drivers/nvme/target/fabrics-cmd-auth.c | 3 --- drivers/nvme/target/fabrics-cmd.c | 6 ------ 3 files changed, 1 insertion(+), 9 deletions(-)
diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c index ce7e945cb4f7e..61bf1bc1ac243 100644 --- a/drivers/nvme/target/core.c +++ b/drivers/nvme/target/core.c @@ -936,6 +936,7 @@ bool nvmet_req_init(struct nvmet_req *req, struct nvmet_cq *cq, req->metadata_sg_cnt = 0; req->transfer_len = 0; req->metadata_len = 0; + req->cqe->result.u64 = 0; req->cqe->status = 0; req->cqe->sq_head = 0; req->ns = NULL; diff --git a/drivers/nvme/target/fabrics-cmd-auth.c b/drivers/nvme/target/fabrics-cmd-auth.c index 1d9854484e2e8..c103eba96350e 100644 --- a/drivers/nvme/target/fabrics-cmd-auth.c +++ b/drivers/nvme/target/fabrics-cmd-auth.c @@ -332,7 +332,6 @@ void nvmet_execute_auth_send(struct nvmet_req *req) pr_debug("%s: ctrl %d qid %d nvme status %x error loc %d\n", __func__, ctrl->cntlid, req->sq->qid, status, req->error_loc); - req->cqe->result.u64 = 0; if (req->sq->dhchap_step != NVME_AUTH_DHCHAP_MESSAGE_SUCCESS2 && req->sq->dhchap_step != NVME_AUTH_DHCHAP_MESSAGE_FAILURE2) { unsigned long auth_expire_secs = ctrl->kato ? ctrl->kato : 120; @@ -515,8 +514,6 @@ void nvmet_execute_auth_receive(struct nvmet_req *req) status = nvmet_copy_to_sgl(req, 0, d, al); kfree(d); done: - req->cqe->result.u64 = 0; - if (req->sq->dhchap_step == NVME_AUTH_DHCHAP_MESSAGE_SUCCESS2) nvmet_auth_sq_free(req->sq); else if (req->sq->dhchap_step == NVME_AUTH_DHCHAP_MESSAGE_FAILURE1) { diff --git a/drivers/nvme/target/fabrics-cmd.c b/drivers/nvme/target/fabrics-cmd.c index d8da840a1c0ed..fa9e8dc921539 100644 --- a/drivers/nvme/target/fabrics-cmd.c +++ b/drivers/nvme/target/fabrics-cmd.c @@ -225,9 +225,6 @@ static void nvmet_execute_admin_connect(struct nvmet_req *req) if (status) goto out;
- /* zero out initial completion result, assign values as needed */ - req->cqe->result.u32 = 0; - if (c->recfmt != 0) { pr_warn("invalid connect version (%d).\n", le16_to_cpu(c->recfmt)); @@ -305,9 +302,6 @@ static void nvmet_execute_io_connect(struct nvmet_req *req) if (status) goto out;
- /* zero out initial completion result, assign values as needed */ - req->cqe->result.u32 = 0; - if (c->recfmt != 0) { pr_warn("invalid connect version (%d).\n", le16_to_cpu(c->recfmt));
linux-stable-mirror@lists.linaro.org