Chuck Lever chuck.lever@oracle.com wrote:
On Sat, Nov 16, 2024 at 01:33:19AM +0900, Jeongjun Park wrote:
2024년 11월 16일 (토) 오전 1:25, Chuck Lever chuck.lever@oracle.com님이 작성:
On Fri, Nov 15, 2024 at 11:04:56AM -0500, Chuck Lever wrote:
I've found that NFS access to an exported tmpfs file system hangs indefinitely when the client first performs a GETATTR. The hanging nfsd thread is waiting for the inode lock in shmem_getattr():
task:nfsd state:D stack:0 pid:1775 tgid:1775 ppid:2 flags:0x00004000 Call Trace:
<TASK> __schedule+0x770/0x7b0 schedule+0x33/0x50 schedule_preempt_disabled+0x19/0x30 rwsem_down_read_slowpath+0x206/0x230 down_read+0x3f/0x60 shmem_getattr+0x84/0xf0 vfs_getattr_nosec+0x9e/0xc0 vfs_getattr+0x49/0x50 fh_getattr+0x43/0x50 [nfsd] fh_fill_pre_attrs+0x4e/0xd0 [nfsd] nfsd4_open+0x51f/0x910 [nfsd] nfsd4_proc_compound+0x492/0x5d0 [nfsd] nfsd_dispatch+0x117/0x1f0 [nfsd] svc_process_common+0x3b2/0x5e0 [sunrpc] ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd] svc_process+0xcf/0x130 [sunrpc] svc_recv+0x64e/0x750 [sunrpc] ? __wake_up_bit+0x4b/0x60 ? __pfx_nfsd+0x10/0x10 [nfsd] nfsd+0xc6/0xf0 [nfsd] kthread+0xed/0x100 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x2e/0x50 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1a/0x30 </TASK>
I bisected the problem to:
d949d1d14fa281ace388b1de978e8f2cd52875cf is the first bad commit commit d949d1d14fa281ace388b1de978e8f2cd52875cf Author: Jeongjun Park aha310510@gmail.com AuthorDate: Mon Sep 9 21:35:58 2024 +0900 Commit: Andrew Morton akpm@linux-foundation.org CommitDate: Mon Oct 28 21:40:39 2024 -0700
mm: shmem: fix data-race in shmem_getattr()
...
Link: https://lkml.kernel.org/r/20240909123558.70229-1-aha310510@gmail.com Fixes: 44a30220bc0a ("shmem: recalculate file inode when fstat") Signed-off-by: Jeongjun Park <aha310510@gmail.com> Reported-by: syzbot <syzkaller@googlegroup.com> Cc: Hugh Dickins <hughd@google.com> Cc: Yu Zhao <yuzhao@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
which first appeared in v6.12-rc6, and adds the line that is waiting on the inode lock when my NFS server hangs.
I haven't yet found the process that is holding the inode lock for this inode.
It is likely that the caller (nfsd4_open()-> fh_fill_pre_attrs()) is already holding the inode semaphore in this case.
Thanks for letting me know!
It seems that the previous patch I wrote was wrong in how to prevent data-race. It seems that the problem occurs in nfsd because nfsd4_create_file() already holds the inode_lock.
After further analysis, I found that this data-race mainly occurs when vfs_statx_path does not acquire the inode_lock, and in other filesystems, it is confirmed that inode_lock is acquired in many cases, so I will send a new patch that fixes this problem right away.
Thanks for your quick response!
My brief sample of file system ->getattr methods shows that these functions do not grab the inode semaphore at all when calling generic_fillattr(). Likely they expect the method's caller to take it.
I strongly prefer to see this commit reverted for v6.12-rc first, and then the new fix should be merged via a normal merge window to permit a lengthy period of testing.
Hmm... Of course, revert this patch is not a bad idea, but I think that patching it like below can effectively prevent data-race without causing recursive locking:
--- mm/shmem.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/mm/shmem.c b/mm/shmem.c index e87f5d6799a7..d061f8b34d49 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -1153,6 +1153,12 @@ static int shmem_getattr(struct mnt_idmap *idmap, { struct inode *inode = path->dentry->d_inode; struct shmem_inode_info *info = SHMEM_I(inode); + bool inode_locked = NULL; + + if (!inode_is_locked(inode)) { + inode_lock_shared(inode); + inode_locked = true; + }
if (info->alloced - info->swapped != inode->i_mapping->nrpages) shmem_recalc_inode(inode, 0, 0); @@ -1166,9 +1172,7 @@ static int shmem_getattr(struct mnt_idmap *idmap, stat->attributes_mask |= (STATX_ATTR_APPEND | STATX_ATTR_IMMUTABLE | STATX_ATTR_NODUMP); - inode_lock_shared(inode); generic_fillattr(idmap, request_mask, inode, stat); - inode_unlock_shared(inode);
if (shmem_huge_global_enabled(inode, 0, 0, false, NULL, 0)) stat->blksize = HPAGE_PMD_SIZE; @@ -1179,6 +1183,9 @@ static int shmem_getattr(struct mnt_idmap *idmap, stat->btime.tv_nsec = info->i_crtime.tv_nsec; }
+ if (inode_locked) + inode_unlock_shared(inode); + return 0; }
--
What do you think?
Regards,
Jeongjun Park
Because this commit addresses only a KCSAN splat that has been present since v4.3, and does not address a reported behavioral issue, I respectfully request that this commit be reverted immediately so that it does not appear in v6.12 final. Troubleshooting and testing should continue until a fix to the KCSAN issue can be found that does not deadlock NFS exports of tmpfs.
-- Chuck Lever
-- Chuck Lever
-- Chuck Lever