On Thu, Jun 1, 2023 at 9:10 PM Alexander Aring aahringo@redhat.com wrote:
Hi,
On Thu, Jun 1, 2023 at 1:11 PM Andreas Gruenbacher agruenba@redhat.com wrote:
On Thu, Jun 1, 2023 at 6:28 PM Alexander Aring aahringo@redhat.com wrote:
Hi,
On Tue, May 30, 2023 at 1:40 PM Andreas Gruenbacher agruenba@redhat.com wrote:
On Tue, May 30, 2023 at 4:08 PM Alexander Aring aahringo@redhat.com wrote:
Hi,
On Tue, May 30, 2023 at 7:01 AM Andreas Gruenbacher agruenba@redhat.com wrote:
On Tue, May 30, 2023 at 12:19 AM Alexander Aring aahringo@redhat.com wrote: > Hi, > > On Thu, May 25, 2023 at 11:02 AM Andreas Gruenbacher > agruenba@redhat.com wrote: > > > > On Wed, May 24, 2023 at 6:02 PM Alexander Aring aahringo@redhat.com wrote: > > > This patch fixes a possible plock op collisions when using F_SETLKW lock > > > requests and fsid, number and owner are not enough to identify a result > > > for a pending request. The ltp testcases [0] and [1] are examples when > > > this is not enough in case of using classic posix locks with threads and > > > open filedescriptor posix locks. > > > > > > The idea to fix the issue here is to place all lock request in order. In > > > case of non F_SETLKW lock request (indicated if wait is set or not) the > > > lock requests are ordered inside the recv_list. If a result comes back > > > the right plock op can be found by the first plock_op in recv_list which > > > has not info.wait set. This can be done only by non F_SETLKW plock ops as > > > dlm_controld always reads a specific plock op (list_move_tail() from > > > send_list to recv_mlist) and write the result immediately back. > > > > > > This behaviour is for F_SETLKW not possible as multiple waiters can be > > > get a result back in an random order. To avoid a collisions in cases > > > like [0] or [1] this patch adds more fields to compare the plock > > > operations as the lock request is the same. This is also being made in > > > NFS to find an result for an asynchronous F_SETLKW lock request [2][3]. We > > > still can't find the exact lock request for a specific result if the > > > lock request is the same, but if this is the case we don't care the > > > order how the identical lock requests get their result back to grant the > > > lock. > > > > When the recv_list contains multiple indistinguishable requests, this > > can only be because they originated from multiple threads of the same > > process. In that case, I agree that it doesn't matter which of those > > requests we "complete" in dev_write() as long as we only complete one > > request. We do need to compare the additional request fields in > > dev_write() to find a suitable request, so that makes sense as well. > > We need to compare all of the fields that identify a request (optype, > > ex, wait, pid, nodeid, fsid, number, start, end, owner) to find the > > "right" request (or in case there is more than one identical request, > > a "suitable" request). > > > > In my "definition" why this works is as you said the "identical > request". There is a more deeper definition of "when is a request > identical" and in my opinion it is here as: "A request A is identical > to request B when they get granted under the same 'time'" which is all > the fields you mentioned. > > Even with cancellation (F_SETLKW only) it does not matter which > "identical request" you cancel because the kernel and user > (dlm_controld) makes no relation between a lock request instance. You > need to have at least the same amount of "results" coming back from > user space as the amount you are waiting for a result for the same > "identical request".
That's not incorrect per se, but cancellations create an additional difficulty: they can either succeed or fail. To indicate that a cancellation has succeeded, a new type of message can be introduced (say, "CANCELLED"), and it's obvious that a CANCELLED message can only belong to a locking request that is being cancelled. When cancelling a locking request fails, the kernel will see a "locking request granted" message though, and when multiple identical locking requests are queued and only some of them have been cancelled, it won't be obvious which locking request a "locking request granted" message should be assigned to anymore. You really don't want to mix things up in that case.
This complication makes it a whole lot more difficult to reason about the correctness of the code. All that complexity is avoidable by sticking with a fixed mapping of requests and replies (i.e., a unique request identifier).
To put it differently, you can shoot yourself in the foot and still hop along on the other leg, but it may not be the best of all possible ideas.
It makes things more complicated, I agree and the reason why this works now is because there are a lot of "dependencies". I would love to have an unique identifier to make it possible that we can follow an instance handle of the original lock request.
- an unique identifier which also works with the async lock request of
lockd case.
What's the lockd case you're referring to here, and why is it relevant for the problem at hand?
just mentioned that we need a solution which also works for the asynchronous lock request (F_SETLK, F_SETLKW) case, there is only one user lockd. [0] DLM plock handling implements the behaviour mentioned at [0] but lm_grant() callback can also return negative values and signals that the lock request was cancelled (on nfs layer) and then need to tell it DLM. This however is not supported as we have a lack of cancellation.
Ouch, that's a bit messy. But if the vfs_lock_file() description is accurate, then only F_SETLK requests arriving via lockd can be asynchronous, and F_SETLKW requests never are asynchronous. And we only need to cancel F_SETLKW requests. It follows that we only ever need to cancel synchronous requests.
I looked into the code and I think the second sentence of [0] is important regarding to F_SLEEP "Callers expecting ->lock() to return asynchronously will only use F_SETLK, not F_SETLKW; they will set FL_SLEEP if (and only if) the request is for a blocking lock.". If lockd does a lock request, it will then set args.wait to 1 if it was F_SETLKW [1].
Hmm, this sets args.block?
The receiver will then create a blocking request [2] and set FL_SLEEP [3]; it does unset FL_SLEEP when args.wait (now as wait parameter) wasn't set. There is a case of [5] which unset wait again, but it seems we can end with FL_SLEEP there anyway.
I think we have currently an issue here that we handle F_SETLK when F_SLEEP (in case of async lockd request) is handled as trylock which isn't right. Don't ask me why they are going over F_SLEEP and F_SETLK and not just using F_SETLKW to signal that it is a blocking request. So we actually have the F_SETLKW case but it's just signaled with F_SETLK + FL_SLEEP?
It seems to me that the vfs_lock_file() description and the distinction it makes between F_SETLK and F_SETLKW makes sense in a scenario when locking is implemented locally in the kernel, but not in the context of dlm_controld, where all the locking decisions are made in user-space: in the former scenario, F_SETLK requests can be decided immediately without sleeping, but F_SETLKW requests may sleep. In the dlm_controld scenario, locking requests can never be decided immediately, and the requesting task will always sleep. So from the point of view of lockd, any request to dlm_controld should probably be treated as asynchronous.
This makes things a bit more ugly than I was hoping for.
Thanks, Andreas
- Alex
[0] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/l... [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/l... [2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/l... [3] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/l... [4] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/l... [5] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/l...