From: Viacheslav Dubeyko Slava.Dubeyko@ibm.com
[ Upstream commit 53db6f25ee47cb1265141d31562604e56146919a ]
The wake_up_bit() is called in ceph_async_unlink_cb(), wake_async_create_waiters(), and ceph_finish_async_create(). It makes sense to switch on clear_bit() function, because it makes the code much cleaner and easier to understand. More important rework is the adding of smp_mb__after_atomic() memory barrier after the bit modification and before wake_up_bit() call. It can prevent potential race condition of accessing the modified bit in other threads. Luckily, clear_and_wake_up_bit() already implements the required functionality pattern:
static inline void clear_and_wake_up_bit(int bit, unsigned long *word) { clear_bit_unlock(bit, word); /* See wake_up_bit() for which memory barrier you need to use. */ smp_mb__after_atomic(); wake_up_bit(word, bit); }
Signed-off-by: Viacheslav Dubeyko Slava.Dubeyko@ibm.com Reviewed-by: Alex Markuze amarkuze@redhat.com Signed-off-by: Ilya Dryomov idryomov@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
YES – this patch fixes a real wakeup race in the Ceph client and is well suited for stable backporting. **Why Backport** - clear_and_wake_up_bit at fs/ceph/dir.c:1263 and fs/ceph/file.c:582,764 replaces the ad-hoc clear+wake sequence and injects the smp_mb__after_atomic() that wake_up_bit() demands, preventing stale- bit wakeups. - Waiters in ceph_wait_on_async_create (fs/ceph/mds_client.h:657-662) and ceph_wait_on_conflict_unlink (fs/ceph/mds_client.c:836-885) depend on the bit being visible before the wake; otherwise wake_bit_function (kernel/sched/wait_bit.c:24-34) refuses to wake them and those threads can hang. - Leveraging the helper also gives the release semantics of clear_bit_unlock (include/linux/wait_bit.h:550-556), so any state published before clearing di->flags or ci->i_ceph_flags becomes observable to the awakened waiters.
**Risk** - Change is limited to swapping in an existing core helper within Ceph flag handling, so functional risk is minimal and there are no external dependencies.
Next steps: 1) If possible, run CephFS async create/unlink regression tests or exercise the workloads that originally hit the wait-on-bit stalls.
fs/ceph/dir.c | 3 +-- fs/ceph/file.c | 6 ++---- 2 files changed, 3 insertions(+), 6 deletions(-)
diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c index 32973c62c1a23..d18c0eaef9b7e 100644 --- a/fs/ceph/dir.c +++ b/fs/ceph/dir.c @@ -1260,8 +1260,7 @@ static void ceph_async_unlink_cb(struct ceph_mds_client *mdsc, spin_unlock(&fsc->async_unlink_conflict_lock);
spin_lock(&dentry->d_lock); - di->flags &= ~CEPH_DENTRY_ASYNC_UNLINK; - wake_up_bit(&di->flags, CEPH_DENTRY_ASYNC_UNLINK_BIT); + clear_and_wake_up_bit(CEPH_DENTRY_ASYNC_UNLINK_BIT, &di->flags); spin_unlock(&dentry->d_lock);
synchronize_rcu(); diff --git a/fs/ceph/file.c b/fs/ceph/file.c index 978acd3d4b329..d7b943feb9320 100644 --- a/fs/ceph/file.c +++ b/fs/ceph/file.c @@ -579,8 +579,7 @@ static void wake_async_create_waiters(struct inode *inode,
spin_lock(&ci->i_ceph_lock); if (ci->i_ceph_flags & CEPH_I_ASYNC_CREATE) { - ci->i_ceph_flags &= ~CEPH_I_ASYNC_CREATE; - wake_up_bit(&ci->i_ceph_flags, CEPH_ASYNC_CREATE_BIT); + clear_and_wake_up_bit(CEPH_ASYNC_CREATE_BIT, &ci->i_ceph_flags);
if (ci->i_ceph_flags & CEPH_I_ASYNC_CHECK_CAPS) { ci->i_ceph_flags &= ~CEPH_I_ASYNC_CHECK_CAPS; @@ -762,8 +761,7 @@ static int ceph_finish_async_create(struct inode *dir, struct inode *inode, }
spin_lock(&dentry->d_lock); - di->flags &= ~CEPH_DENTRY_ASYNC_CREATE; - wake_up_bit(&di->flags, CEPH_DENTRY_ASYNC_CREATE_BIT); + clear_and_wake_up_bit(CEPH_DENTRY_ASYNC_CREATE_BIT, &di->flags); spin_unlock(&dentry->d_lock);
return ret;