From: Viacheslav Dubeyko Slava.Dubeyko@ibm.com
[ Upstream commit 5824ccba9a39a3ad914fc9b2972a2c1119abaac9 ]
The Coverity Scan service has detected potential race condition in ceph_ioctl_lazyio() [1].
The CID 1591046 contains explanation: "Check of thread-shared field evades lock acquisition (LOCK_EVASION). Thread1 sets fmode to a new value. Now the two threads have an inconsistent view of fmode and updates to fields correlated with fmode may be lost. The data guarded by this critical section may be read while in an inconsistent state or modified by multiple racing threads. In ceph_ioctl_lazyio: Checking the value of a thread-shared field outside of a locked region to determine if a locked operation involving that thread shared field has completed. (CWE-543)".
The patch places fi->fmode field access under ci->i_ceph_lock protection. Also, it introduces the is_file_already_lazy variable that is set under the lock and it is checked later out of scope of critical section.
[1] https://scan5.scan.coverity.com/#/project-view/64304/10063?selectedIssue=159...
Signed-off-by: Viacheslav Dubeyko Slava.Dubeyko@ibm.com Reviewed-by: Alex Markuze amarkuze@redhat.com Signed-off-by: Ilya Dryomov idryomov@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
YES - The race arises because the pre-patch code reads `fi->fmode` without holding `ci->i_ceph_lock`, so two threads can both see the lazy bit clear, then each increment `ci->i_nr_by_mode[ffs(CEPH_FILE_MODE_LAZY)]++` before either releases the lock. That leaves the counter permanently elevated, desynchronising the per-mode counts that `ceph_put_fmode()` relies on to drop capability refs (`fs/ceph/ioctl.c` before line 212, contrasted with `fs/ceph/caps.c:4744-4789`). - The patch now performs the test and update while the lock is held (`fs/ceph/ioctl.c:212-220`), eliminating the window where concurrent callers can both act on stale state; the new `is_file_already_lazy` flag preserves the existing logging/`ceph_check_caps()` calls after the lock is released (`fs/ceph/ioctl.c:221-228`) so behaviour remains unchanged aside from closing the race. - Keeping `i_nr_by_mode` accurate is important beyond metrics: it feeds `__ceph_caps_file_wanted()` when deciding what caps to request or drop (`fs/ceph/caps.c:1006-1061`). With the race, a leaked lazy count prevents the last close path from seeing the inode as idle, delaying capability release and defeating the lazyio semantics the ioctl is supposed to provide. - The change is tightly scoped (one function, no API or struct changes, same code paths still call `__ceph_touch_fmode()` and `ceph_check_caps()`), so regression risk is minimal while the fix hardens a locking invariant already respected by other fmode transitions such as `ceph_get_fmode()` (`fs/ceph/caps.c:4727-4754`). - No newer infrastructure is required—the fields, lock, and helpers touched here have existed in long-term stable kernels—so this bug fix is suitable for stable backporting despite the likely need to adjust the `doutc` helper name on older branches.
fs/ceph/ioctl.c | 17 ++++++++++++----- 1 file changed, 12 insertions(+), 5 deletions(-)
diff --git a/fs/ceph/ioctl.c b/fs/ceph/ioctl.c index e861de3c79b9e..15cde055f3da1 100644 --- a/fs/ceph/ioctl.c +++ b/fs/ceph/ioctl.c @@ -246,21 +246,28 @@ static long ceph_ioctl_lazyio(struct file *file) struct ceph_inode_info *ci = ceph_inode(inode); struct ceph_mds_client *mdsc = ceph_inode_to_fs_client(inode)->mdsc; struct ceph_client *cl = mdsc->fsc->client; + bool is_file_already_lazy = false;
+ spin_lock(&ci->i_ceph_lock); if ((fi->fmode & CEPH_FILE_MODE_LAZY) == 0) { - spin_lock(&ci->i_ceph_lock); fi->fmode |= CEPH_FILE_MODE_LAZY; ci->i_nr_by_mode[ffs(CEPH_FILE_MODE_LAZY)]++; __ceph_touch_fmode(ci, mdsc, fi->fmode); - spin_unlock(&ci->i_ceph_lock); + } else { + is_file_already_lazy = true; + } + spin_unlock(&ci->i_ceph_lock); + + if (is_file_already_lazy) { + doutc(cl, "file %p %p %llx.%llx already lazy\n", file, inode, + ceph_vinop(inode)); + } else { doutc(cl, "file %p %p %llx.%llx marked lazy\n", file, inode, ceph_vinop(inode));
ceph_check_caps(ci, 0); - } else { - doutc(cl, "file %p %p %llx.%llx already lazy\n", file, inode, - ceph_vinop(inode)); } + return 0; }