From: Johannes Berg johannes.berg@intel.com
[ Upstream commit b2296eeac91555bd13f774efa7ab7d4b12fb71ef ]
Now that UML has PCI support, this driver must depend also on !UML since it pokes at X86_64 architecture internals that don't exist on ARCH=um.
Reported-by: kernel test robot lkp@intel.com Signed-off-by: Johannes Berg johannes.berg@intel.com Acked-by: Dave Jiang dave.jiang@intel.com Acked-By: Anton Ivanov anton.ivanov@cambridgegreys.com Link: https://lore.kernel.org/r/20210625103810.fe877ae0aef4.If240438e3f50ae226f3f7... Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/dma/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig index f28bb2334e74..3a745e8a0f42 100644 --- a/drivers/dma/Kconfig +++ b/drivers/dma/Kconfig @@ -285,7 +285,7 @@ config INTEL_IDMA64
config INTEL_IDXD tristate "Intel Data Accelerators support" - depends on PCI && X86_64 + depends on PCI && X86_64 && !UML depends on PCI_MSI depends on SBITMAP select DMA_ENGINE
From: Zou Wei zou_wei@huawei.com
[ Upstream commit 4faee8b65ec32346f8096e64c5fa1d5a73121742 ]
This patch adds missing MODULE_DEVICE_TABLE definition which generates correct modalias for automatic loading of this driver when it is built as an external module.
Reported-by: Hulk Robot hulkci@huawei.com Signed-off-by: Zou Wei zou_wei@huawei.com Reviewed-by: Baolin Wang baolin.wang7@gmail.com Link: https://lore.kernel.org/r/1620094977-70146-1-git-send-email-zou_wei@huawei.c... Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/dma/sprd-dma.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/dma/sprd-dma.c b/drivers/dma/sprd-dma.c index 0ef5ca81ba4d..4357d2395e6b 100644 --- a/drivers/dma/sprd-dma.c +++ b/drivers/dma/sprd-dma.c @@ -1265,6 +1265,7 @@ static const struct of_device_id sprd_dma_match[] = { { .compatible = "sprd,sc9860-dma", }, {}, }; +MODULE_DEVICE_TABLE(of, sprd_dma_match);
static int __maybe_unused sprd_dma_runtime_suspend(struct device *dev) {
From: Johannes Berg johannes.berg@intel.com
[ Upstream commit bbac7a92a46f0876e588722ebe552ddfe6fd790f ]
Now that UML has PCI support, this driver must depend also on !UML since it pokes at X86_64 architecture internals that don't exist on ARCH=um.
Reported-by: Geert Uytterhoeven geert@linux-m68k.org Signed-off-by: Johannes Berg johannes.berg@intel.com Acked-by: Dave Jiang dave.jiang@intel.com Link: https://lore.kernel.org/r/20210809112409.a3a0974874d2.I2ffe3d11ed37f735da2f3... Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/dma/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig index 3a745e8a0f42..08013345d1f2 100644 --- a/drivers/dma/Kconfig +++ b/drivers/dma/Kconfig @@ -299,7 +299,7 @@ config INTEL_IDXD
config INTEL_IOATDMA tristate "Intel I/OAT DMA support" - depends on PCI && X86_64 + depends on PCI && X86_64 && !UML select DMA_ENGINE select DMA_ENGINE_RAID select DCA
From: Radhey Shyam Pandey radhey.shyam.pandey@xilinx.com
[ Upstream commit aac6c0f90799d66b8989be1e056408f33fd99fe6 ]
The xilinx dma driver uses the consistent allocations, so for correct operation also set the DMA mask for coherent APIs. It fixes the below kernel crash with dmatest client when DMA IP is configured with 64-bit address width and linux is booted from high (>4GB) memory.
Call trace: [ 489.531257] dma_alloc_from_pool+0x8c/0x1c0 [ 489.535431] dma_direct_alloc+0x284/0x330 [ 489.539432] dma_alloc_attrs+0x80/0xf0 [ 489.543174] dma_pool_alloc+0x160/0x2c0 [ 489.547003] xilinx_cdma_prep_memcpy+0xa4/0x180 [ 489.551524] dmatest_func+0x3cc/0x114c [ 489.555266] kthread+0x124/0x130 [ 489.558486] ret_from_fork+0x10/0x3c [ 489.562051] ---[ end trace 248625b2d596a90a ]---
Signed-off-by: Radhey Shyam Pandey radhey.shyam.pandey@xilinx.com Reviewed-by: Harini Katakam harini.katakam@xilinx.com Link: https://lore.kernel.org/r/1629363528-30347-1-git-send-email-radhey.shyam.pan... Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/dma/xilinx/xilinx_dma.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/dma/xilinx/xilinx_dma.c b/drivers/dma/xilinx/xilinx_dma.c index 9ffdbeec436b..cab4719e4cf9 100644 --- a/drivers/dma/xilinx/xilinx_dma.c +++ b/drivers/dma/xilinx/xilinx_dma.c @@ -3070,7 +3070,7 @@ static int xilinx_dma_probe(struct platform_device *pdev) xdev->ext_addr = false;
/* Set the dma mask bits */ - dma_set_mask(xdev->dev, DMA_BIT_MASK(addr_width)); + dma_set_mask_and_coherent(xdev->dev, DMA_BIT_MASK(addr_width));
/* Initialize the DMA engine */ xdev->common.dev = &pdev->dev;
From: Jeff Layton jlayton@kernel.org
[ Upstream commit b11ed50346683a749632ea664959b28d524d7395 ]
The current code will update the mtime and then try to get caps to handle the write. If we end up having to request caps from the MDS, then the mtime in the cap grant will clobber the updated mtime and it'll be lost.
This is most noticable when two clients are alternately writing to the same file. Fw caps are continually being granted and revoked, and the mtime ends up stuck because the updated mtimes are always being overwritten with the old one.
Fix this by changing the order of operations in ceph_write_iter to get the caps before updating the times. Also, make sure we check the pool full conditions before even getting any caps or uninlining.
URL: https://tracker.ceph.com/issues/46574 Reported-by: Jozef Kováč kovac@firma.zoznam.sk Signed-off-by: Jeff Layton jlayton@kernel.org Reviewed-by: Xiubo Li xiubli@redhat.com Reviewed-by: Luis Henriques lhenriques@suse.de Signed-off-by: Ilya Dryomov idryomov@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/ceph/file.c | 32 +++++++++++++++++--------------- 1 file changed, 17 insertions(+), 15 deletions(-)
diff --git a/fs/ceph/file.c b/fs/ceph/file.c index 3d2e3dd4ee01..f1895f78ab45 100644 --- a/fs/ceph/file.c +++ b/fs/ceph/file.c @@ -1723,32 +1723,26 @@ static ssize_t ceph_write_iter(struct kiocb *iocb, struct iov_iter *from) goto out; }
- err = file_remove_privs(file); - if (err) + down_read(&osdc->lock); + map_flags = osdc->osdmap->flags; + pool_flags = ceph_pg_pool_flags(osdc->osdmap, ci->i_layout.pool_id); + up_read(&osdc->lock); + if ((map_flags & CEPH_OSDMAP_FULL) || + (pool_flags & CEPH_POOL_FLAG_FULL)) { + err = -ENOSPC; goto out; + }
- err = file_update_time(file); + err = file_remove_privs(file); if (err) goto out;
- inode_inc_iversion_raw(inode); - if (ci->i_inline_version != CEPH_INLINE_NONE) { err = ceph_uninline_data(file, NULL); if (err < 0) goto out; }
- down_read(&osdc->lock); - map_flags = osdc->osdmap->flags; - pool_flags = ceph_pg_pool_flags(osdc->osdmap, ci->i_layout.pool_id); - up_read(&osdc->lock); - if ((map_flags & CEPH_OSDMAP_FULL) || - (pool_flags & CEPH_POOL_FLAG_FULL)) { - err = -ENOSPC; - goto out; - } - dout("aio_write %p %llx.%llx %llu~%zd getting caps. i_size %llu\n", inode, ceph_vinop(inode), pos, count, i_size_read(inode)); if (fi->fmode & CEPH_FILE_MODE_LAZY) @@ -1761,6 +1755,12 @@ static ssize_t ceph_write_iter(struct kiocb *iocb, struct iov_iter *from) if (err < 0) goto out;
+ err = file_update_time(file); + if (err) + goto out_caps; + + inode_inc_iversion_raw(inode); + dout("aio_write %p %llx.%llx %llu~%zd got cap refs on %s\n", inode, ceph_vinop(inode), pos, count, ceph_cap_string(got));
@@ -1844,6 +1844,8 @@ static ssize_t ceph_write_iter(struct kiocb *iocb, struct iov_iter *from) }
goto out_unlocked; +out_caps: + ceph_put_cap_refs(ci, got); out: if (direct_lock) ceph_end_io_direct(inode);
From: Xiubo Li xiubli@redhat.com
[ Upstream commit a6d37ccdd240e80f26aaea0e62cda310e0e184d7 ]
capsnaps will take inode references via ihold when queueing to flush. When force unmounting, the client will just close the sessions and may never get a flush reply, causing a leak and inode ref leak.
Fix this by removing the capsnaps for an inode when removing the caps.
URL: https://tracker.ceph.com/issues/52295 Signed-off-by: Xiubo Li xiubli@redhat.com Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Ilya Dryomov idryomov@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/ceph/caps.c | 68 +++++++++++++++++++++++++++++++++----------- fs/ceph/mds_client.c | 31 +++++++++++++++++++- fs/ceph/super.h | 6 ++++ 3 files changed, 87 insertions(+), 18 deletions(-)
diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c index b864c9b9e8df..a975a9354ee5 100644 --- a/fs/ceph/caps.c +++ b/fs/ceph/caps.c @@ -3166,7 +3166,16 @@ void ceph_put_wrbuffer_cap_refs(struct ceph_inode_info *ci, int nr, break; } } - BUG_ON(!found); + + if (!found) { + /* + * The capsnap should already be removed when removing + * auth cap in the case of a forced unmount. + */ + WARN_ON_ONCE(ci->i_auth_cap); + goto unlock; + } + capsnap->dirty_pages -= nr; if (capsnap->dirty_pages == 0) { complete_capsnap = true; @@ -3188,6 +3197,7 @@ void ceph_put_wrbuffer_cap_refs(struct ceph_inode_info *ci, int nr, complete_capsnap ? " (complete capsnap)" : ""); }
+unlock: spin_unlock(&ci->i_ceph_lock);
if (last) { @@ -3654,6 +3664,43 @@ static void handle_cap_flush_ack(struct inode *inode, u64 flush_tid, iput(inode); }
+void __ceph_remove_capsnap(struct inode *inode, struct ceph_cap_snap *capsnap, + bool *wake_ci, bool *wake_mdsc) +{ + struct ceph_inode_info *ci = ceph_inode(inode); + struct ceph_mds_client *mdsc = ceph_sb_to_client(inode->i_sb)->mdsc; + bool ret; + + lockdep_assert_held(&ci->i_ceph_lock); + + dout("removing capsnap %p, inode %p ci %p\n", capsnap, inode, ci); + + list_del_init(&capsnap->ci_item); + ret = __detach_cap_flush_from_ci(ci, &capsnap->cap_flush); + if (wake_ci) + *wake_ci = ret; + + spin_lock(&mdsc->cap_dirty_lock); + if (list_empty(&ci->i_cap_flush_list)) + list_del_init(&ci->i_flushing_item); + + ret = __detach_cap_flush_from_mdsc(mdsc, &capsnap->cap_flush); + if (wake_mdsc) + *wake_mdsc = ret; + spin_unlock(&mdsc->cap_dirty_lock); +} + +void ceph_remove_capsnap(struct inode *inode, struct ceph_cap_snap *capsnap, + bool *wake_ci, bool *wake_mdsc) +{ + struct ceph_inode_info *ci = ceph_inode(inode); + + lockdep_assert_held(&ci->i_ceph_lock); + + WARN_ON_ONCE(capsnap->dirty_pages || capsnap->writing); + __ceph_remove_capsnap(inode, capsnap, wake_ci, wake_mdsc); +} + /* * Handle FLUSHSNAP_ACK. MDS has flushed snap data to disk and we can * throw away our cap_snap. @@ -3691,23 +3738,10 @@ static void handle_cap_flushsnap_ack(struct inode *inode, u64 flush_tid, capsnap, capsnap->follows); } } - if (flushed) { - WARN_ON(capsnap->dirty_pages || capsnap->writing); - dout(" removing %p cap_snap %p follows %lld\n", - inode, capsnap, follows); - list_del(&capsnap->ci_item); - wake_ci |= __detach_cap_flush_from_ci(ci, &capsnap->cap_flush); - - spin_lock(&mdsc->cap_dirty_lock); - - if (list_empty(&ci->i_cap_flush_list)) - list_del_init(&ci->i_flushing_item); - - wake_mdsc |= __detach_cap_flush_from_mdsc(mdsc, - &capsnap->cap_flush); - spin_unlock(&mdsc->cap_dirty_lock); - } + if (flushed) + ceph_remove_capsnap(inode, capsnap, &wake_ci, &wake_mdsc); spin_unlock(&ci->i_ceph_lock); + if (flushed) { ceph_put_snap_context(capsnap->context); ceph_put_cap_snap(capsnap); diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c index 816cea497537..f2c5c264b228 100644 --- a/fs/ceph/mds_client.c +++ b/fs/ceph/mds_client.c @@ -1585,14 +1585,39 @@ int ceph_iterate_session_caps(struct ceph_mds_session *session, return ret; }
+static int remove_capsnaps(struct ceph_mds_client *mdsc, struct inode *inode) +{ + struct ceph_inode_info *ci = ceph_inode(inode); + struct ceph_cap_snap *capsnap; + int capsnap_release = 0; + + lockdep_assert_held(&ci->i_ceph_lock); + + dout("removing capsnaps, ci is %p, inode is %p\n", ci, inode); + + while (!list_empty(&ci->i_cap_snaps)) { + capsnap = list_first_entry(&ci->i_cap_snaps, + struct ceph_cap_snap, ci_item); + __ceph_remove_capsnap(inode, capsnap, NULL, NULL); + ceph_put_snap_context(capsnap->context); + ceph_put_cap_snap(capsnap); + capsnap_release++; + } + wake_up_all(&ci->i_cap_wq); + wake_up_all(&mdsc->cap_flushing_wq); + return capsnap_release; +} + static int remove_session_caps_cb(struct inode *inode, struct ceph_cap *cap, void *arg) { struct ceph_fs_client *fsc = (struct ceph_fs_client *)arg; + struct ceph_mds_client *mdsc = fsc->mdsc; struct ceph_inode_info *ci = ceph_inode(inode); LIST_HEAD(to_remove); bool dirty_dropped = false; bool invalidate = false; + int capsnap_release = 0;
dout("removing cap %p, ci is %p, inode is %p\n", cap, ci, &ci->vfs_inode); @@ -1600,7 +1625,6 @@ static int remove_session_caps_cb(struct inode *inode, struct ceph_cap *cap, __ceph_remove_cap(cap, false); if (!ci->i_auth_cap) { struct ceph_cap_flush *cf; - struct ceph_mds_client *mdsc = fsc->mdsc;
if (READ_ONCE(fsc->mount_state) == CEPH_MOUNT_SHUTDOWN) { if (inode->i_data.nrpages > 0) @@ -1664,6 +1688,9 @@ static int remove_session_caps_cb(struct inode *inode, struct ceph_cap *cap, list_add(&ci->i_prealloc_cap_flush->i_list, &to_remove); ci->i_prealloc_cap_flush = NULL; } + + if (!list_empty(&ci->i_cap_snaps)) + capsnap_release = remove_capsnaps(mdsc, inode); } spin_unlock(&ci->i_ceph_lock); while (!list_empty(&to_remove)) { @@ -1680,6 +1707,8 @@ static int remove_session_caps_cb(struct inode *inode, struct ceph_cap *cap, ceph_queue_invalidate(inode); if (dirty_dropped) iput(inode); + while (capsnap_release--) + iput(inode); return 0; }
diff --git a/fs/ceph/super.h b/fs/ceph/super.h index a8c460393b01..9362eeb5812d 100644 --- a/fs/ceph/super.h +++ b/fs/ceph/super.h @@ -1134,6 +1134,12 @@ extern void ceph_put_cap_refs_no_check_caps(struct ceph_inode_info *ci, int had); extern void ceph_put_wrbuffer_cap_refs(struct ceph_inode_info *ci, int nr, struct ceph_snap_context *snapc); +extern void __ceph_remove_capsnap(struct inode *inode, + struct ceph_cap_snap *capsnap, + bool *wake_ci, bool *wake_mdsc); +extern void ceph_remove_capsnap(struct inode *inode, + struct ceph_cap_snap *capsnap, + bool *wake_ci, bool *wake_mdsc); extern void ceph_flush_snaps(struct ceph_inode_info *ci, struct ceph_mds_session **psession); extern bool __ceph_should_report_size(struct ceph_inode_info *ci);
From: Jeff Layton jlayton@kernel.org
[ Upstream commit 3eaf5aa1cfa8c97c72f5824e2e9263d6cc977b03 ]
Signed-off-by: Jeff Layton jlayton@kernel.org Reviewed-by: Ilya Dryomov idryomov@gmail.com Signed-off-by: Ilya Dryomov idryomov@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/ceph/caps.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c index a975a9354ee5..dcd61e33b82e 100644 --- a/fs/ceph/caps.c +++ b/fs/ceph/caps.c @@ -1865,6 +1865,8 @@ static u64 __mark_caps_flushing(struct inode *inode, * try to invalidate mapping pages without blocking. */ static int try_nonblocking_invalidate(struct inode *inode) + __releases(ci->i_ceph_lock) + __acquires(ci->i_ceph_lock) { struct ceph_inode_info *ci = ceph_inode(inode); u32 invalidating_gen = ci->i_rdcache_gen;
From: Josef Bacik josef@toxicpanda.com
[ Upstream commit 8f96a5bfa1503e0a5f3c78d51e993a1794d4aff1 ]
We update the ctime/mtime of a block device when we remove it so that blkid knows the device changed. However we do this by re-opening the block device and calling filp_update_time. This is more correct because it'll call the inode->i_op->update_time if it exists, but the block dev inodes do not do this. Instead call generic_update_time() on the bd_inode in order to avoid the blkdev_open path and get rid of the following lockdep splat:
====================================================== WARNING: possible circular locking dependency detected 5.14.0-rc2+ #406 Not tainted ------------------------------------------------------ losetup/11596 is trying to acquire lock: ffff939640d2f538 ((wq_completion)loop0){+.+.}-{0:0}, at: flush_workqueue+0x67/0x5e0
but task is already holding lock: ffff939655510c68 (&lo->lo_mutex){+.+.}-{3:3}, at: __loop_clr_fd+0x41/0x660 [loop]
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #4 (&lo->lo_mutex){+.+.}-{3:3}: __mutex_lock+0x7d/0x750 lo_open+0x28/0x60 [loop] blkdev_get_whole+0x25/0xf0 blkdev_get_by_dev.part.0+0x168/0x3c0 blkdev_open+0xd2/0xe0 do_dentry_open+0x161/0x390 path_openat+0x3cc/0xa20 do_filp_open+0x96/0x120 do_sys_openat2+0x7b/0x130 __x64_sys_openat+0x46/0x70 do_syscall_64+0x38/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae
-> #3 (&disk->open_mutex){+.+.}-{3:3}: __mutex_lock+0x7d/0x750 blkdev_get_by_dev.part.0+0x56/0x3c0 blkdev_open+0xd2/0xe0 do_dentry_open+0x161/0x390 path_openat+0x3cc/0xa20 do_filp_open+0x96/0x120 file_open_name+0xc7/0x170 filp_open+0x2c/0x50 btrfs_scratch_superblocks.part.0+0x10f/0x170 btrfs_rm_device.cold+0xe8/0xed btrfs_ioctl+0x2a31/0x2e70 __x64_sys_ioctl+0x80/0xb0 do_syscall_64+0x38/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae
-> #2 (sb_writers#12){.+.+}-{0:0}: lo_write_bvec+0xc2/0x240 [loop] loop_process_work+0x238/0xd00 [loop] process_one_work+0x26b/0x560 worker_thread+0x55/0x3c0 kthread+0x140/0x160 ret_from_fork+0x1f/0x30
-> #1 ((work_completion)(&lo->rootcg_work)){+.+.}-{0:0}: process_one_work+0x245/0x560 worker_thread+0x55/0x3c0 kthread+0x140/0x160 ret_from_fork+0x1f/0x30
-> #0 ((wq_completion)loop0){+.+.}-{0:0}: __lock_acquire+0x10ea/0x1d90 lock_acquire+0xb5/0x2b0 flush_workqueue+0x91/0x5e0 drain_workqueue+0xa0/0x110 destroy_workqueue+0x36/0x250 __loop_clr_fd+0x9a/0x660 [loop] block_ioctl+0x3f/0x50 __x64_sys_ioctl+0x80/0xb0 do_syscall_64+0x38/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae
other info that might help us debug this:
Chain exists of: (wq_completion)loop0 --> &disk->open_mutex --> &lo->lo_mutex
Possible unsafe locking scenario:
CPU0 CPU1 ---- ---- lock(&lo->lo_mutex); lock(&disk->open_mutex); lock(&lo->lo_mutex); lock((wq_completion)loop0);
*** DEADLOCK ***
1 lock held by losetup/11596: #0: ffff939655510c68 (&lo->lo_mutex){+.+.}-{3:3}, at: __loop_clr_fd+0x41/0x660 [loop]
stack backtrace: CPU: 1 PID: 11596 Comm: losetup Not tainted 5.14.0-rc2+ #406 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014 Call Trace: dump_stack_lvl+0x57/0x72 check_noncircular+0xcf/0xf0 ? stack_trace_save+0x3b/0x50 __lock_acquire+0x10ea/0x1d90 lock_acquire+0xb5/0x2b0 ? flush_workqueue+0x67/0x5e0 ? lockdep_init_map_type+0x47/0x220 flush_workqueue+0x91/0x5e0 ? flush_workqueue+0x67/0x5e0 ? verify_cpu+0xf0/0x100 drain_workqueue+0xa0/0x110 destroy_workqueue+0x36/0x250 __loop_clr_fd+0x9a/0x660 [loop] ? blkdev_ioctl+0x8d/0x2a0 block_ioctl+0x3f/0x50 __x64_sys_ioctl+0x80/0xb0 do_syscall_64+0x38/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae
Reviewed-by: Anand Jain anand.jain@oracle.com Signed-off-by: Josef Bacik josef@toxicpanda.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/btrfs/volumes.c | 18 ++++++++++-------- 1 file changed, 10 insertions(+), 8 deletions(-)
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index d1fccddcf403..003960c484a1 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -1852,15 +1852,17 @@ static int btrfs_add_dev_item(struct btrfs_trans_handle *trans, * Function to update ctime/mtime for a given device path. * Mainly used for ctime/mtime based probe like libblkid. */ -static void update_dev_time(const char *path_name) +static void update_dev_time(struct block_device *bdev) { - struct file *filp; + struct inode *inode = bdev->bd_inode; + struct timespec64 now;
- filp = filp_open(path_name, O_RDWR, 0); - if (IS_ERR(filp)) + /* Shouldn't happen but just in case. */ + if (!inode) return; - file_update_time(filp); - filp_close(filp, NULL); + + now = current_time(inode); + generic_update_time(inode, &now, S_MTIME | S_CTIME); }
static int btrfs_rm_dev_item(struct btrfs_device *device) @@ -2035,7 +2037,7 @@ void btrfs_scratch_superblocks(struct btrfs_fs_info *fs_info, btrfs_kobject_uevent(bdev, KOBJ_CHANGE);
/* Update ctime/mtime for device path for libblkid */ - update_dev_time(device_path); + update_dev_time(bdev); }
int btrfs_rm_device(struct btrfs_fs_info *fs_info, const char *device_path, @@ -2678,7 +2680,7 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path btrfs_forget_devices(device_path);
/* Update ctime/mtime for blkid or udev */ - update_dev_time(device_path); + update_dev_time(bdev);
return ret;
From: Anand Jain anand.jain@oracle.com
[ Upstream commit c124706900c20dee70f921bb3a90492431561a0a ]
Following test case reproduces lockdep warning.
Test case:
$ mkfs.btrfs -f <dev1> $ btrfstune -S 1 <dev1> $ mount <dev1> <mnt> $ btrfs device add <dev2> <mnt> -f $ umount <mnt> $ mount <dev2> <mnt> $ umount <mnt>
The warning claims a possible ABBA deadlock between the threads initiated by [#1] btrfs device add and [#0] the mount.
[ 540.743122] WARNING: possible circular locking dependency detected [ 540.743129] 5.11.0-rc7+ #5 Not tainted [ 540.743135] ------------------------------------------------------ [ 540.743142] mount/2515 is trying to acquire lock: [ 540.743149] ffffa0c5544c2ce0 (&fs_devs->device_list_mutex){+.+.}-{4:4}, at: clone_fs_devices+0x6d/0x210 [btrfs] [ 540.743458] but task is already holding lock: [ 540.743461] ffffa0c54a7932b8 (btrfs-chunk-00){++++}-{4:4}, at: __btrfs_tree_read_lock+0x32/0x200 [btrfs] [ 540.743541] which lock already depends on the new lock. [ 540.743543] the existing dependency chain (in reverse order) is:
[ 540.743546] -> #1 (btrfs-chunk-00){++++}-{4:4}: [ 540.743566] down_read_nested+0x48/0x2b0 [ 540.743585] __btrfs_tree_read_lock+0x32/0x200 [btrfs] [ 540.743650] btrfs_read_lock_root_node+0x70/0x200 [btrfs] [ 540.743733] btrfs_search_slot+0x6c6/0xe00 [btrfs] [ 540.743785] btrfs_update_device+0x83/0x260 [btrfs] [ 540.743849] btrfs_finish_chunk_alloc+0x13f/0x660 [btrfs] <--- device_list_mutex [ 540.743911] btrfs_create_pending_block_groups+0x18d/0x3f0 [btrfs] [ 540.743982] btrfs_commit_transaction+0x86/0x1260 [btrfs] [ 540.744037] btrfs_init_new_device+0x1600/0x1dd0 [btrfs] [ 540.744101] btrfs_ioctl+0x1c77/0x24c0 [btrfs] [ 540.744166] __x64_sys_ioctl+0xe4/0x140 [ 540.744170] do_syscall_64+0x4b/0x80 [ 540.744174] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 540.744180] -> #0 (&fs_devs->device_list_mutex){+.+.}-{4:4}: [ 540.744184] __lock_acquire+0x155f/0x2360 [ 540.744188] lock_acquire+0x10b/0x5c0 [ 540.744190] __mutex_lock+0xb1/0xf80 [ 540.744193] mutex_lock_nested+0x27/0x30 [ 540.744196] clone_fs_devices+0x6d/0x210 [btrfs] [ 540.744270] btrfs_read_chunk_tree+0x3c7/0xbb0 [btrfs] [ 540.744336] open_ctree+0xf6e/0x2074 [btrfs] [ 540.744406] btrfs_mount_root.cold.72+0x16/0x127 [btrfs] [ 540.744472] legacy_get_tree+0x38/0x90 [ 540.744475] vfs_get_tree+0x30/0x140 [ 540.744478] fc_mount+0x16/0x60 [ 540.744482] vfs_kern_mount+0x91/0x100 [ 540.744484] btrfs_mount+0x1e6/0x670 [btrfs] [ 540.744536] legacy_get_tree+0x38/0x90 [ 540.744537] vfs_get_tree+0x30/0x140 [ 540.744539] path_mount+0x8d8/0x1070 [ 540.744541] do_mount+0x8d/0xc0 [ 540.744543] __x64_sys_mount+0x125/0x160 [ 540.744545] do_syscall_64+0x4b/0x80 [ 540.744547] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 540.744551] other info that might help us debug this: [ 540.744552] Possible unsafe locking scenario:
[ 540.744553] CPU0 CPU1 [ 540.744554] ---- ---- [ 540.744555] lock(btrfs-chunk-00); [ 540.744557] lock(&fs_devs->device_list_mutex); [ 540.744560] lock(btrfs-chunk-00); [ 540.744562] lock(&fs_devs->device_list_mutex); [ 540.744564] *** DEADLOCK ***
[ 540.744565] 3 locks held by mount/2515: [ 540.744567] #0: ffffa0c56bf7a0e0 (&type->s_umount_key#42/1){+.+.}-{4:4}, at: alloc_super.isra.16+0xdf/0x450 [ 540.744574] #1: ffffffffc05a9628 (uuid_mutex){+.+.}-{4:4}, at: btrfs_read_chunk_tree+0x63/0xbb0 [btrfs] [ 540.744640] #2: ffffa0c54a7932b8 (btrfs-chunk-00){++++}-{4:4}, at: __btrfs_tree_read_lock+0x32/0x200 [btrfs] [ 540.744708] stack backtrace: [ 540.744712] CPU: 2 PID: 2515 Comm: mount Not tainted 5.11.0-rc7+ #5
But the device_list_mutex in clone_fs_devices() is redundant, as explained below. Two threads [1] and [2] (below) could lead to clone_fs_device().
[1] open_ctree <== mount sprout fs btrfs_read_chunk_tree() mutex_lock(&uuid_mutex) <== global lock read_one_dev() open_seed_devices() clone_fs_devices() <== seed fs_devices mutex_lock(&orig->device_list_mutex) <== seed fs_devices
[2] btrfs_init_new_device() <== sprouting mutex_lock(&uuid_mutex); <== global lock btrfs_prepare_sprout() lockdep_assert_held(&uuid_mutex) clone_fs_devices(seed_fs_device) <== seed fs_devices
Both of these threads hold uuid_mutex which is sufficient to protect getting the seed device(s) freed while we are trying to clone it for sprouting [2] or mounting a sprout [1] (as above). A mounted seed device can not free/write/replace because it is read-only. An unmounted seed device can be freed by btrfs_free_stale_devices(), but it needs uuid_mutex. So this patch removes the unnecessary device_list_mutex in clone_fs_devices(). And adds a lockdep_assert_held(&uuid_mutex) in clone_fs_devices().
Reported-by: Su Yue l@damenly.su Tested-by: Su Yue l@damenly.su Signed-off-by: Anand Jain anand.jain@oracle.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/btrfs/volumes.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 003960c484a1..a01cec6ae5ff 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -568,6 +568,8 @@ static int btrfs_free_stale_devices(const char *path, struct btrfs_device *device, *tmp_device; int ret = 0;
+ lockdep_assert_held(&uuid_mutex); + if (path) ret = -ENOENT;
@@ -999,11 +1001,12 @@ static struct btrfs_fs_devices *clone_fs_devices(struct btrfs_fs_devices *orig) struct btrfs_device *orig_dev; int ret = 0;
+ lockdep_assert_held(&uuid_mutex); + fs_devices = alloc_fs_devices(orig->fsid, NULL); if (IS_ERR(fs_devices)) return fs_devices;
- mutex_lock(&orig->device_list_mutex); fs_devices->total_devices = orig->total_devices;
list_for_each_entry(orig_dev, &orig->devices, dev_list) { @@ -1035,10 +1038,8 @@ static struct btrfs_fs_devices *clone_fs_devices(struct btrfs_fs_devices *orig) device->fs_devices = fs_devices; fs_devices->num_devices++; } - mutex_unlock(&orig->device_list_mutex); return fs_devices; error: - mutex_unlock(&orig->device_list_mutex); free_fs_devices(fs_devices); return ERR_PTR(ret); }
From: Ohhoon Kwon ohoono.kwon@samsung.com
[ Upstream commit c2f273ebd89a79ed87ef1025753343e327b99ac9 ]
While comm change event via prctl has been reported to proc connector by 'commit f786ecba4158 ("connector: add comm change event report to proc connector")', connector listeners were missing comm changes by explicit writes on /proc/[pid]/comm.
Let explicit writes on /proc/[pid]/comm report to proc connector.
Link: https://lkml.kernel.org/r/20210701133458epcms1p68e9eb9bd0eee8903ba26679a37d9... Signed-off-by: Ohhoon Kwon ohoono.kwon@samsung.com Cc: Ingo Molnar mingo@kernel.org Cc: David S. Miller davem@davemloft.net Cc: Christian Brauner christian.brauner@ubuntu.com Cc: Eric W. Biederman ebiederm@xmission.com Cc: Alexey Dobriyan adobriyan@gmail.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- fs/proc/base.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/fs/proc/base.c b/fs/proc/base.c index 5d52aea8d7e7..d01e09758354 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -96,6 +96,7 @@ #include <linux/posix-timers.h> #include <linux/time_namespace.h> #include <linux/resctrl.h> +#include <linux/cn_proc.h> #include <trace/events/oom.h> #include "internal.h" #include "fd.h" @@ -1675,8 +1676,10 @@ static ssize_t comm_write(struct file *file, const char __user *buf, if (!p) return -ESRCH;
- if (same_thread_group(current, p)) + if (same_thread_group(current, p)) { set_task_comm(p, buffer); + proc_comm_connector(p); + } else count = -EINVAL;
From: Nanyong Sun sunnanyong@huawei.com
[ Upstream commit 5f5dec07aca7067216ed4c1342e464e7307a9197 ]
Patch series "nilfs2: fix incorrect usage of kobject".
This patchset from Nanyong Sun fixes memory leak issues and a NULL pointer dereference issue caused by incorrect usage of kboject in nilfs2 sysfs implementation.
This patch (of 6):
Reported by syzkaller:
BUG: memory leak unreferenced object 0xffff888100ca8988 (size 8): comm "syz-executor.1", pid 1930, jiffies 4294745569 (age 18.052s) hex dump (first 8 bytes): 6c 6f 6f 70 31 00 ff ff loop1... backtrace: kstrdup+0x36/0x70 mm/util.c:60 kstrdup_const+0x35/0x60 mm/util.c:83 kvasprintf_const+0xf1/0x180 lib/kasprintf.c:48 kobject_set_name_vargs+0x56/0x150 lib/kobject.c:289 kobject_add_varg lib/kobject.c:384 [inline] kobject_init_and_add+0xc9/0x150 lib/kobject.c:473 nilfs_sysfs_create_device_group+0x150/0x7d0 fs/nilfs2/sysfs.c:986 init_nilfs+0xa21/0xea0 fs/nilfs2/the_nilfs.c:637 nilfs_fill_super fs/nilfs2/super.c:1046 [inline] nilfs_mount+0x7b4/0xe80 fs/nilfs2/super.c:1316 legacy_get_tree+0x105/0x210 fs/fs_context.c:592 vfs_get_tree+0x8e/0x2d0 fs/super.c:1498 do_new_mount fs/namespace.c:2905 [inline] path_mount+0xf9b/0x1990 fs/namespace.c:3235 do_mount+0xea/0x100 fs/namespace.c:3248 __do_sys_mount fs/namespace.c:3456 [inline] __se_sys_mount fs/namespace.c:3433 [inline] __x64_sys_mount+0x14b/0x1f0 fs/namespace.c:3433 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3b/0x90 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae
If kobject_init_and_add return with error, then the cleanup of kobject is needed because memory may be allocated in kobject_init_and_add without freeing.
And the place of cleanup_dev_kobject should use kobject_put to free the memory associated with the kobject. As the section "Kobject removal" of "Documentation/core-api/kobject.rst" says, kobject_del() just makes the kobject "invisible", but it is not cleaned up. And no more cleanup will do after cleanup_dev_kobject, so kobject_put is needed here.
Link: https://lkml.kernel.org/r/1625651306-10829-1-git-send-email-konishi.ryusuke@... Link: https://lkml.kernel.org/r/1625651306-10829-2-git-send-email-konishi.ryusuke@... Reported-by: Hulk Robot hulkci@huawei.com Link: https://lkml.kernel.org/r/20210629022556.3985106-2-sunnanyong@huawei.com Signed-off-by: Nanyong Sun sunnanyong@huawei.com Signed-off-by: Ryusuke Konishi konishi.ryusuke@gmail.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nilfs2/sysfs.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/fs/nilfs2/sysfs.c b/fs/nilfs2/sysfs.c index 9c6c0e2e5880..b6a48492fed2 100644 --- a/fs/nilfs2/sysfs.c +++ b/fs/nilfs2/sysfs.c @@ -999,7 +999,7 @@ int nilfs_sysfs_create_device_group(struct super_block *sb) err = kobject_init_and_add(&nilfs->ns_dev_kobj, &nilfs_dev_ktype, NULL, "%s", sb->s_id); if (err) - goto free_dev_subgroups; + goto cleanup_dev_kobject;
err = nilfs_sysfs_create_mounted_snapshots_group(nilfs); if (err) @@ -1036,9 +1036,7 @@ int nilfs_sysfs_create_device_group(struct super_block *sb) nilfs_sysfs_delete_mounted_snapshots_group(nilfs);
cleanup_dev_kobject: - kobject_del(&nilfs->ns_dev_kobj); - -free_dev_subgroups: + kobject_put(&nilfs->ns_dev_kobj); kfree(nilfs->ns_dev_subgroups);
failed_create_device_group:
From: Nanyong Sun sunnanyong@huawei.com
[ Upstream commit dbc6e7d44a514f231a64d9d5676e001b660b6448 ]
In nilfs_##name##_attr_release, kobj->parent should not be referenced because it is a NULL pointer. The release() method of kobject is always called in kobject_put(kobj), in the implementation of kobject_put(), the kobj->parent will be assigned as NULL before call the release() method. So just use kobj to get the subgroups, which is more efficient and can fix a NULL pointer reference problem.
Link: https://lkml.kernel.org/r/20210629022556.3985106-3-sunnanyong@huawei.com Link: https://lkml.kernel.org/r/1625651306-10829-3-git-send-email-konishi.ryusuke@... Signed-off-by: Nanyong Sun sunnanyong@huawei.com Signed-off-by: Ryusuke Konishi konishi.ryusuke@gmail.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nilfs2/sysfs.c | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-)
diff --git a/fs/nilfs2/sysfs.c b/fs/nilfs2/sysfs.c index b6a48492fed2..43f660beb9b4 100644 --- a/fs/nilfs2/sysfs.c +++ b/fs/nilfs2/sysfs.c @@ -64,11 +64,9 @@ static const struct sysfs_ops nilfs_##name##_attr_ops = { \ #define NILFS_DEV_INT_GROUP_TYPE(name, parent_name) \ static void nilfs_##name##_attr_release(struct kobject *kobj) \ { \ - struct nilfs_sysfs_##parent_name##_subgroups *subgroups; \ - struct the_nilfs *nilfs = container_of(kobj->parent, \ - struct the_nilfs, \ - ns_##parent_name##_kobj); \ - subgroups = nilfs->ns_##parent_name##_subgroups; \ + struct nilfs_sysfs_##parent_name##_subgroups *subgroups = container_of(kobj, \ + struct nilfs_sysfs_##parent_name##_subgroups, \ + sg_##name##_kobj); \ complete(&subgroups->sg_##name##_kobj_unregister); \ } \ static struct kobj_type nilfs_##name##_ktype = { \
From: Nanyong Sun sunnanyong@huawei.com
[ Upstream commit 24f8cb1ed057c840728167dab33b32e44147c86f ]
If kobject_init_and_add return with error, kobject_put() is needed here to avoid memory leak, because kobject_init_and_add may return error without freeing the memory associated with the kobject it allocated.
Link: https://lkml.kernel.org/r/20210629022556.3985106-4-sunnanyong@huawei.com Link: https://lkml.kernel.org/r/1625651306-10829-4-git-send-email-konishi.ryusuke@... Signed-off-by: Nanyong Sun sunnanyong@huawei.com Signed-off-by: Ryusuke Konishi konishi.ryusuke@gmail.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nilfs2/sysfs.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/nilfs2/sysfs.c b/fs/nilfs2/sysfs.c index 43f660beb9b4..5dc468ff5903 100644 --- a/fs/nilfs2/sysfs.c +++ b/fs/nilfs2/sysfs.c @@ -92,8 +92,8 @@ static int nilfs_sysfs_create_##name##_group(struct the_nilfs *nilfs) \ err = kobject_init_and_add(kobj, &nilfs_##name##_ktype, parent, \ #name); \ if (err) \ - return err; \ - return 0; \ + kobject_put(kobj); \ + return err; \ } \ static void nilfs_sysfs_delete_##name##_group(struct the_nilfs *nilfs) \ { \
From: Nanyong Sun sunnanyong@huawei.com
[ Upstream commit a3e181259ddd61fd378390977a1e4e2316853afa ]
The kobject_put() should be used to cleanup the memory associated with the kobject instead of kobject_del. See the section "Kobject removal" of "Documentation/core-api/kobject.rst".
Link: https://lkml.kernel.org/r/20210629022556.3985106-5-sunnanyong@huawei.com Link: https://lkml.kernel.org/r/1625651306-10829-5-git-send-email-konishi.ryusuke@... Signed-off-by: Nanyong Sun sunnanyong@huawei.com Signed-off-by: Ryusuke Konishi konishi.ryusuke@gmail.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nilfs2/sysfs.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/nilfs2/sysfs.c b/fs/nilfs2/sysfs.c index 5dc468ff5903..34893a57a7b9 100644 --- a/fs/nilfs2/sysfs.c +++ b/fs/nilfs2/sysfs.c @@ -97,7 +97,7 @@ static int nilfs_sysfs_create_##name##_group(struct the_nilfs *nilfs) \ } \ static void nilfs_sysfs_delete_##name##_group(struct the_nilfs *nilfs) \ { \ - kobject_del(&nilfs->ns_##parent_name##_subgroups->sg_##name##_kobj); \ + kobject_put(&nilfs->ns_##parent_name##_subgroups->sg_##name##_kobj); \ }
/************************************************************************
From: Nanyong Sun sunnanyong@huawei.com
[ Upstream commit b2fe39c248f3fa4bbb2a20759b4fdd83504190f7 ]
If kobject_init_and_add returns with error, kobject_put() is needed here to avoid memory leak, because kobject_init_and_add may return error without freeing the memory associated with the kobject it allocated.
Link: https://lkml.kernel.org/r/20210629022556.3985106-6-sunnanyong@huawei.com Link: https://lkml.kernel.org/r/1625651306-10829-6-git-send-email-konishi.ryusuke@... Signed-off-by: Nanyong Sun sunnanyong@huawei.com Signed-off-by: Ryusuke Konishi konishi.ryusuke@gmail.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nilfs2/sysfs.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/nilfs2/sysfs.c b/fs/nilfs2/sysfs.c index 34893a57a7b9..44b9ad70a564 100644 --- a/fs/nilfs2/sysfs.c +++ b/fs/nilfs2/sysfs.c @@ -208,9 +208,9 @@ int nilfs_sysfs_create_snapshot_group(struct nilfs_root *root) }
if (err) - return err; + kobject_put(&root->snapshot_kobj);
- return 0; + return err; }
void nilfs_sysfs_delete_snapshot_group(struct nilfs_root *root)
From: Nanyong Sun sunnanyong@huawei.com
[ Upstream commit 17243e1c3072b8417a5ebfc53065d0a87af7ca77 ]
kobject_put() should be used to cleanup the memory associated with the kobject instead of kobject_del(). See the section "Kobject removal" of "Documentation/core-api/kobject.rst".
Link: https://lkml.kernel.org/r/20210629022556.3985106-7-sunnanyong@huawei.com Link: https://lkml.kernel.org/r/1625651306-10829-7-git-send-email-konishi.ryusuke@... Signed-off-by: Nanyong Sun sunnanyong@huawei.com Signed-off-by: Ryusuke Konishi konishi.ryusuke@gmail.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nilfs2/sysfs.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/nilfs2/sysfs.c b/fs/nilfs2/sysfs.c index 44b9ad70a564..57afd06db62d 100644 --- a/fs/nilfs2/sysfs.c +++ b/fs/nilfs2/sysfs.c @@ -215,7 +215,7 @@ int nilfs_sysfs_create_snapshot_group(struct nilfs_root *root)
void nilfs_sysfs_delete_snapshot_group(struct nilfs_root *root) { - kobject_del(&root->snapshot_kobj); + kobject_put(&root->snapshot_kobj); }
/************************************************************************
linux-stable-mirror@lists.linaro.org