On Wed, May 24, 2023 at 6:02 PM Alexander Aring aahringo@redhat.com wrote:
This patch fixes a possible plock op collisions when using F_SETLKW lock requests and fsid, number and owner are not enough to identify a result for a pending request. The ltp testcases [0] and [1] are examples when this is not enough in case of using classic posix locks with threads and open filedescriptor posix locks.
The idea to fix the issue here is to place all lock request in order. In case of non F_SETLKW lock request (indicated if wait is set or not) the lock requests are ordered inside the recv_list. If a result comes back the right plock op can be found by the first plock_op in recv_list which has not info.wait set. This can be done only by non F_SETLKW plock ops as dlm_controld always reads a specific plock op (list_move_tail() from send_list to recv_mlist) and write the result immediately back.
This behaviour is for F_SETLKW not possible as multiple waiters can be get a result back in an random order. To avoid a collisions in cases like [0] or [1] this patch adds more fields to compare the plock operations as the lock request is the same. This is also being made in NFS to find an result for an asynchronous F_SETLKW lock request [2][3]. We still can't find the exact lock request for a specific result if the lock request is the same, but if this is the case we don't care the order how the identical lock requests get their result back to grant the lock.
When the recv_list contains multiple indistinguishable requests, this can only be because they originated from multiple threads of the same process. In that case, I agree that it doesn't matter which of those requests we "complete" in dev_write() as long as we only complete one request. We do need to compare the additional request fields in dev_write() to find a suitable request, so that makes sense as well. We need to compare all of the fields that identify a request (optype, ex, wait, pid, nodeid, fsid, number, start, end, owner) to find the "right" request (or in case there is more than one identical request, a "suitable" request).
The above patch description doesn't match the code anymore, and the code doesn't fully revert the recv_list splitting of the previous version.
[0] https://gitlab.com/netcoder/ltp/-/blob/dlm_fcntl_owner_testcase/testcases/ke... [1] https://gitlab.com/netcoder/ltp/-/blob/dlm_fcntl_owner_testcase/testcases/ke... [2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/incl... [3] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/l...
Cc: stable@vger.kernel.org Signed-off-by: Alexander Aring aahringo@redhat.com
change since v2:
- don't split recv_list into recv_setlkw_list
fs/dlm/plock.c | 43 ++++++++++++++++++++++++++++++------------- 1 file changed, 30 insertions(+), 13 deletions(-)
diff --git a/fs/dlm/plock.c b/fs/dlm/plock.c index 31bc601ee3d8..53d17dbbb716 100644 --- a/fs/dlm/plock.c +++ b/fs/dlm/plock.c @@ -391,7 +391,7 @@ static ssize_t dev_read(struct file *file, char __user *u, size_t count, if (op->info.flags & DLM_PLOCK_FL_CLOSE) list_del(&op->list); else
list_move(&op->list, &recv_list);
list_move_tail(&op->list, &recv_list);
^ This should be obsolete, but it won't hurt, either.
memcpy(&info, &op->info, sizeof(info)); } spin_unlock(&ops_lock);
@@ -430,19 +430,36 @@ static ssize_t dev_write(struct file *file, const char __user *u, size_t count, return -EINVAL;
spin_lock(&ops_lock);
list_for_each_entry(iter, &recv_list, list) {
if (iter->info.fsid == info.fsid &&
iter->info.number == info.number &&
iter->info.owner == info.owner) {
list_del_init(&iter->list);
memcpy(&iter->info, &info, sizeof(info));
if (iter->data)
do_callback = 1;
else
iter->done = 1;
op = iter;
break;
if (info.wait) {
We should be able to use the same list_for_each_entry() loop for F_SETLKW requests (which have info.wait set) as for all other requests as far as I can see.
list_for_each_entry(iter, &recv_list, list) {
if (iter->info.fsid == info.fsid &&
iter->info.number == info.number &&
iter->info.owner == info.owner &&
iter->info.pid == info.pid &&
iter->info.start == info.start &&
iter->info.end == info.end &&
iter->info.ex == info.ex &&
iter->info.wait) {
op = iter;
break;
} }
} else {
list_for_each_entry(iter, &recv_list, list) {
if (!iter->info.wait) {
op = iter;
break;
}
}
}
if (op) {
list_del_init(&op->list);
memcpy(&op->info, &info, sizeof(info));
if (op->data)
do_callback = 1;
else
op->done = 1; }
Can't this code just remain in the list_for_each_entry() loop?
spin_unlock(&ops_lock);
-- 2.31.1
Andreas