On 2/29/20 2:27 AM, Thomas Gleixner wrote:
"Pierre-Loup A. Griffais" pgriffais@valvesoftware.com writes:
On 2/28/20 1:25 PM, Thomas Gleixner wrote:
Peter Zijlstra peterz@infradead.org writes:
Thomas mentioned something like that, the problem is, ofcourse, that we then want to fix a whole bunch of historical ills, and the probmem becomes much bigger.
We keep piling features on top of an interface and mechanism which is fragile as hell and horrible to maintain. Adding vectoring, multi size and whatever is not making it any better.
There is also the long standing issue with NUMA, which we can't address with the current pile at all.
So I'm really advocating that all involved parties sit down ASAP and hash out a new and less convoluted mechanism where all the magic new features can be addressed in a sane way so that the 'F' in Futex really only means Fast and not some other word starting with 'F'.
Are you specifically talking about the interface, or the mechanism itself? Would you be OK with a new syscall that calls into the same code as this patch? It does seem like that's what we want, so if we rewrote a mechanism I'm not convinced it would come out any different. But, the interface itself seems fair-game to rewrite, as the current futex syscall is turning into an ioctl of sorts.
No, you are misreading what I said. How does a new syscall make any difference? It still adds new crap to a maze which is already in a state of dubious maintainability.
I was just going by the context added by Peter, which seemed to imply your concerns were mostly around the interface, because I couldn't understand a clear course of action to follow just from your email. And frankly, still can't, but hopefully you can help us get there.
This solves a real problem with a real usecase; so I'd like to stay practical and not go into deeper issues like solving NUMA support for all of futex in the interest of users waiting at the other end. Can you point us to your preferred approach just for the scope of what we're trying to accomplish?
If we go by the argument that something solves a real use case and take this as justification to proliferate existing crap, then we never get to the point where things get redesigned from ground up. Quite the contrary, we are going to duct tape it to death.
It does not matter at all whether the syscall is multiplexing or split up into 5 different ones. That's a pure cosmetic exercise.
While all the currently proposed extensions (multiple wait, variable size) make sense conceptually, I'm really uncomfortable to just cram them into the existing code. They create an ABI which we have to maintain forever.
From experience I just know that every time we extended the futex interface we opened another can of worms which hunted us for years if not for more then a decade. Guess who has to deal with that. Surely not the people who drive by and solve their real world usecases. Just go and read the changelog history of futexes very carefully and you might understand what kind of complex beasts they are.
At some point we simply have to say stop, sit down and figure out which kind of functionality we really need in order to solve real world user space problems and which of the gazillion futex (mis)features are just there as historical ballast and do not have to be supported in a new implementation, REQUEUE is just the most obvious example.
I completely understand that you want to stay practical and just want to solve your particular itch, but please understand that the people who have to deal with the fallout and have dealt with it for 15+ years have very practical reasons to say no.
Note that it would have been nice to get that high-level feedback on the first version; instead we just received back specific feedback on the implementation itself, and questions about usecase/motivation that we tried to address, but that didn't elicit any follow-ups.
Please bear with me for a second in case you thought you were obviously very clear about the path forward, but are you saying that:
1. Our usecase is valid, but we're not correct about futex being the right fit for it, and we should design an implement a new primitive to handle it?
2. Our usecase is valid, and our research showing that futex is the optimal right fit for it might be correct, but futex has to be significantly refactored before accepting this new feature. (or any new feature?)
If it was 1., I think our new solution would either end up looking more or less exactly like futex, just with some of the more exotic functionality removed (although even that is arguable, since I wouldn't be surprised if we ended up using eg. requeue for some of the more complex migration scenarios). In which case I assume someone else would ask the question on why we're doing this new thing instead of adding to futex. OR, if intentionally made not futex-like, would end up not being optimal, which would make it not the right solution and a non-started to begin with. There's a reason we moved away from eventfd, even ignoring the fd exhaustion problem that some problematic apps fall victim to.
If it's 2., then we'd be hard-pressed to proceed forward without your guidance.
Conceptually it seems like multiple wait is an important missing feature in futex compared to core threading primitives of other platforms. It isn't the first time that the lack of it has come up for us and other game developers. Due to futex being so central and important, I completely understand it is tricky to get right and might be hard to maintain if not done correctly. It seems worthwhile to undertake, at least from our limited perspective. We'd be glad to help upstream get there, if possible.
Thanks, - Pierre-Loup
Thanks,
tglx