Hi,
This patch-series trying to avoid issues when plock ops with DLM_PLOCK_FL_CLOSE flag is set sends a reply back which should never be the case. This problem getting more serious when introducing a new plock op and an answer was not expected as
I changed in v2 to check on DLM_PLOCK_FL_CLOSE flag for stable as this can also being used to fix the potential issue for older kernels and it does not change the UAPI. For newer user space applications the new flag DLM_PLOCK_FL_NO_REPLY will tell the user space application to never send an result back, it will handle this filter earlier in user space. For older user space software we will filter the result in ther kernel.
This requires the behaviour that the flags are the same for the request and the reply which is the case for dlm_controld.
Also fix the wrapped string and don't spam the user ignoring replies.
- Alex
Alexander Aring (3): fs: dlm: ignore DLM_PLOCK_FL_CLOSE flag results fs: dlm: introduce DLM_PLOCK_FL_NO_REPLY flag fs: dlm: allow to F_SETLKW getting interrupted
fs/dlm/plock.c | 107 ++++++++++++++++++++++++--------- include/uapi/linux/dlm_plock.h | 2 + 2 files changed, 81 insertions(+), 28 deletions(-)
This patch will ignore dlm plock results with DLM_PLOCK_FL_CLOSE being set. When DLM_PLOCK_FL_CLOSE is set then no reply is expected and a plock op cannot being matched and the result cannot be delivered to the caller. In some user space software applications like dlm_controld (the common application and only knowing implementation using this UAPI) can send an error back even if an result is never being expected.
This patch will ignore results if DLM_PLOCK_FL_CLOSE is being set, but requires that the user space application sents the result back with DLM_PLOCK_FL_CLOSE set which is the case for dlm_controld.
Fixes: 901025d2f319 ("dlm: make plock operation killable") Cc: stable@vger.kernel.org Signed-off-by: Alexander Aring aahringo@redhat.com --- fs/dlm/plock.c | 8 ++++++++ 1 file changed, 8 insertions(+)
diff --git a/fs/dlm/plock.c b/fs/dlm/plock.c index 70a4752ed913..869595a995f7 100644 --- a/fs/dlm/plock.c +++ b/fs/dlm/plock.c @@ -433,6 +433,14 @@ static ssize_t dev_write(struct file *file, const char __user *u, size_t count, if (check_version(&info)) return -EINVAL;
+ /* Some dlm user space software will send replies back, + * even if DLM_PLOCK_FL_CLOSE is set e.g. if an error occur. + * We can't match them in recv_list because they were never + * be part of it. + */ + if (info.flags & DLM_PLOCK_FL_CLOSE) + return count; + /* * The results for waiting ops (SETLKW) can be returned in any * order, so match all fields to find the op. The results for
This patch introduces a new flag DLM_PLOCK_FL_NO_REPLY in case an dlm plock operation should never send a reply back. Currently this is kind of being handled in DLM_PLOCK_FL_CLOSE, but DLM_PLOCK_FL_CLOSE has more meanings that it will remove all waiters for a specific nodeid/owner values in by doing a unlock operation. That DLM_PLOCK_FL_CLOSE never sends a reply back is not true in case of some user applications and an error occurred. The new DLM_PLOCK_FL_NO_REPLY flag will tell the user space application to never ever send a reply back. As this is now combined with DLM_PLOCK_FL_CLOSE we can use DLM_PLOCK_FL_NO_REPLY to check if older user space application still doing it and drop the message.
Expecting the user space applications is just copying flags from the request to the reply, we can use now DLM_PLOCK_FL_NO_REPLY as workaround to ignore replies from older dlm_controld versions.
Signed-off-by: Alexander Aring aahringo@redhat.com --- fs/dlm/plock.c | 16 ++++++++++------ include/uapi/linux/dlm_plock.h | 1 + 2 files changed, 11 insertions(+), 6 deletions(-)
diff --git a/fs/dlm/plock.c b/fs/dlm/plock.c index 869595a995f7..1fa5b04d0298 100644 --- a/fs/dlm/plock.c +++ b/fs/dlm/plock.c @@ -96,7 +96,7 @@ static void do_unlock_close(const struct dlm_plock_info *info) op->info.end = OFFSET_MAX; op->info.owner = info->owner;
- op->info.flags |= DLM_PLOCK_FL_CLOSE; + op->info.flags |= (DLM_PLOCK_FL_CLOSE | DLM_PLOCK_FL_NO_REPLY); send_op(op); }
@@ -293,7 +293,7 @@ int dlm_posix_unlock(dlm_lockspace_t *lockspace, u64 number, struct file *file, op->info.owner = (__u64)(long) fl->fl_owner;
if (fl->fl_flags & FL_CLOSE) { - op->info.flags |= DLM_PLOCK_FL_CLOSE; + op->info.flags |= (DLM_PLOCK_FL_CLOSE | DLM_PLOCK_FL_NO_REPLY); send_op(op); rv = 0; goto out; @@ -392,7 +392,7 @@ static ssize_t dev_read(struct file *file, char __user *u, size_t count, spin_lock(&ops_lock); if (!list_empty(&send_list)) { op = list_first_entry(&send_list, struct plock_op, list); - if (op->info.flags & DLM_PLOCK_FL_CLOSE) + if (op->info.flags & DLM_PLOCK_FL_NO_REPLY) list_del(&op->list); else list_move_tail(&op->list, &recv_list); @@ -407,7 +407,7 @@ static ssize_t dev_read(struct file *file, char __user *u, size_t count, that were generated by the vfs cleaning up for a close (the process did not make an unlock call). */
- if (op->info.flags & DLM_PLOCK_FL_CLOSE) + if (op->info.flags & DLM_PLOCK_FL_NO_REPLY) dlm_release_plock_op(op);
if (copy_to_user(u, &info, sizeof(info))) @@ -434,11 +434,15 @@ static ssize_t dev_write(struct file *file, const char __user *u, size_t count, return -EINVAL;
/* Some dlm user space software will send replies back, - * even if DLM_PLOCK_FL_CLOSE is set e.g. if an error occur. + * even if DLM_PLOCK_FL_NO_REPLY is set e.g. if an error + * occur as the op is unknown, etc. + * * We can't match them in recv_list because they were never * be part of it. + * + * In the future, this handling could be removed. */ - if (info.flags & DLM_PLOCK_FL_CLOSE) + if (info.flags & DLM_PLOCK_FL_NO_REPLY) return count;
/* diff --git a/include/uapi/linux/dlm_plock.h b/include/uapi/linux/dlm_plock.h index 63b6c1fd9169..8dfa272c929a 100644 --- a/include/uapi/linux/dlm_plock.h +++ b/include/uapi/linux/dlm_plock.h @@ -25,6 +25,7 @@ enum { };
#define DLM_PLOCK_FL_CLOSE 1 +#define DLM_PLOCK_FL_NO_REPLY 2
struct dlm_plock_info { __u32 version[3];
This patch implements dlm plock F_SETLKW interruption feature. If the pending plock operation is not sent to user space yet it can simple be dropped out of the send_list. In case it's already being sent we need to try to remove the waiters in dlm user space tool. If it was successful a reply with DLM_PLOCK_OP_CANCEL optype instead of DLM_PLOCK_OP_LOCK comes back (flag DLM_PLOCK_FL_NO_REPLY was then being cleared in user space) to signal the cancellation was successful. If a result with optype DLM_PLOCK_OP_LOCK came back then the cancellation was not successful.
Signed-off-by: Alexander Aring aahringo@redhat.com --- fs/dlm/plock.c | 91 ++++++++++++++++++++++++---------- include/uapi/linux/dlm_plock.h | 1 + 2 files changed, 66 insertions(+), 26 deletions(-)
diff --git a/fs/dlm/plock.c b/fs/dlm/plock.c index 1fa5b04d0298..0781a80c25f7 100644 --- a/fs/dlm/plock.c +++ b/fs/dlm/plock.c @@ -29,6 +29,9 @@ struct plock_async_data {
struct plock_op { struct list_head list; +#define DLM_PLOCK_OP_FLAG_SENT 0 +#define DLM_PLOCK_OP_FLAG_INTERRUPTED 1 + unsigned long flags; int done; struct dlm_plock_info info; /* if set indicates async handling */ @@ -74,30 +77,25 @@ static void send_op(struct plock_op *op) wake_up(&send_wq); }
-/* If a process was killed while waiting for the only plock on a file, - locks_remove_posix will not see any lock on the file so it won't - send an unlock-close to us to pass on to userspace to clean up the - abandoned waiter. So, we have to insert the unlock-close when the - lock call is interrupted. */ - -static void do_unlock_close(const struct dlm_plock_info *info) +static int do_lock_cancel(const struct plock_op *orig_op) { struct plock_op *op;
op = kzalloc(sizeof(*op), GFP_NOFS); if (!op) - return; + return -ENOMEM; + + op->info = orig_op->info; + op->info.optype = DLM_PLOCK_OP_CANCEL; + op->info.flags = DLM_PLOCK_FL_NO_REPLY;
- op->info.optype = DLM_PLOCK_OP_UNLOCK; - op->info.pid = info->pid; - op->info.fsid = info->fsid; - op->info.number = info->number; - op->info.start = 0; - op->info.end = OFFSET_MAX; - op->info.owner = info->owner; - - op->info.flags |= (DLM_PLOCK_FL_CLOSE | DLM_PLOCK_FL_NO_REPLY); send_op(op); + wait_event(recv_wq, (orig_op->done != 0)); + + if (orig_op->info.optype == DLM_PLOCK_OP_CANCEL) + return 0; + + return 1; }
int dlm_posix_lock(dlm_lockspace_t *lockspace, u64 number, struct file *file, @@ -156,7 +154,7 @@ int dlm_posix_lock(dlm_lockspace_t *lockspace, u64 number, struct file *file, send_op(op);
if (op->info.wait) { - rv = wait_event_killable(recv_wq, (op->done != 0)); + rv = wait_event_interruptible(recv_wq, (op->done != 0)); if (rv == -ERESTARTSYS) { spin_lock(&ops_lock); /* recheck under ops_lock if we got a done != 0, @@ -166,17 +164,37 @@ int dlm_posix_lock(dlm_lockspace_t *lockspace, u64 number, struct file *file, spin_unlock(&ops_lock); goto do_lock_wait; } - list_del(&op->list); - spin_unlock(&ops_lock); + + if (!test_bit(DLM_PLOCK_OP_FLAG_SENT, &op->flags)) { + /* part of send_list, user never saw the op */ + list_del(&op->list); + spin_unlock(&ops_lock); + rv = -EINTR; + } else { + set_bit(DLM_PLOCK_OP_FLAG_INTERRUPTED, &op->flags); + spin_unlock(&ops_lock); + rv = do_lock_cancel(op); + switch (rv) { + case 0: + rv = -EINTR; + break; + case 1: + /* cancellation wasn't successful but op is done */ + goto do_lock_wait; + default: + /* internal error doing cancel we need to wait */ + goto wait; + } + }
log_debug(ls, "%s: wait interrupted %x %llx pid %d", __func__, ls->ls_global_id, (unsigned long long)number, op->info.pid); - do_unlock_close(&op->info); dlm_release_plock_op(op); goto out; } } else { +wait: wait_event(recv_wq, (op->done != 0)); }
@@ -392,10 +410,12 @@ static ssize_t dev_read(struct file *file, char __user *u, size_t count, spin_lock(&ops_lock); if (!list_empty(&send_list)) { op = list_first_entry(&send_list, struct plock_op, list); - if (op->info.flags & DLM_PLOCK_FL_NO_REPLY) + if (op->info.flags & DLM_PLOCK_FL_NO_REPLY) { list_del(&op->list); - else + } else { list_move_tail(&op->list, &recv_list); + set_bit(DLM_PLOCK_OP_FLAG_SENT, &op->flags); + } memcpy(&info, &op->info, sizeof(info)); } spin_unlock(&ops_lock); @@ -454,6 +474,27 @@ static ssize_t dev_write(struct file *file, const char __user *u, size_t count, spin_lock(&ops_lock); if (info.wait) { list_for_each_entry(iter, &recv_list, list) { + /* A very specific posix lock case allows two + * lock request with the same meaning by using + * threads. It makes no sense from the application + * to do such request, however it is possible. + * We need to check the state for cancellation because + * we need to know the instance which is interrupted + * if two or more of the "same" lock requests are + * in waiting state and got interrupted. + * + * TODO we should move to a instance reference from + * request and reply and not go over lock states, but + * it seems going over lock states and get the instance + * does not make any problems (at least there were no + * issues found yet) but it's much cleaner to not think + * about all possible special cases and states to instance + * has no 1:1 mapping anymore. + */ + if (info.optype == DLM_PLOCK_OP_CANCEL && + !test_bit(DLM_PLOCK_OP_FLAG_INTERRUPTED, &iter->flags)) + continue; + if (iter->info.fsid == info.fsid && iter->info.number == info.number && iter->info.owner == info.owner && @@ -477,9 +518,7 @@ static ssize_t dev_write(struct file *file, const char __user *u, size_t count,
if (op) { /* Sanity check that op and info match. */ - if (info.wait) - WARN_ON(op->info.optype != DLM_PLOCK_OP_LOCK); - else + if (!info.wait) WARN_ON(op->info.fsid != info.fsid || op->info.number != info.number || op->info.owner != info.owner || diff --git a/include/uapi/linux/dlm_plock.h b/include/uapi/linux/dlm_plock.h index 8dfa272c929a..9c4c083c824a 100644 --- a/include/uapi/linux/dlm_plock.h +++ b/include/uapi/linux/dlm_plock.h @@ -22,6 +22,7 @@ enum { DLM_PLOCK_OP_LOCK = 1, DLM_PLOCK_OP_UNLOCK, DLM_PLOCK_OP_GET, + DLM_PLOCK_OP_CANCEL, };
#define DLM_PLOCK_FL_CLOSE 1
Hi,
On Thu, Jul 13, 2023 at 12:49 PM Alexander Aring aahringo@redhat.com wrote:
This patch implements dlm plock F_SETLKW interruption feature. If the pending plock operation is not sent to user space yet it can simple be dropped out of the send_list. In case it's already being sent we need to try to remove the waiters in dlm user space tool. If it was successful a reply with DLM_PLOCK_OP_CANCEL optype instead of DLM_PLOCK_OP_LOCK comes back (flag DLM_PLOCK_FL_NO_REPLY was then being cleared in user space) to signal the cancellation was successful. If a result with optype DLM_PLOCK_OP_LOCK came back then the cancellation was not successful.
There is another use-case for this op that's only used kernel internally by nfs. It's F_CANCELLK [0]. I will try to implement this feature as I think the current behaviour is broken [1]. An unlock is not a revert and if the lock request is in waiting state, unlocking will do exactly nothing.
I am still questioning how the API of [0] is supposed to work as [0] does not evaluate any return value if it was successfully canceled or not. Maybe they meant cancel and if it was not successful unlock it, but an unlock is not a revert and posix locks support up/downgrade locking e.g. read/write locks. However I think unlocking if cancellation wasn't successful is meant here.
Besides that, I will change that DLM_PLOCK_OP_CANCEL will always expect a reply back.
- Alex
[0] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/l... [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/g...
linux-stable-mirror@lists.linaro.org