On Thu, 14 Dec 2023 at 10:07, Stephen Röttger sroettger@google.com wrote:
AIUI, the madvise(DONTNEED) should effectively only change the content of anonymous pages, i.e. it's similar to a memset(0) in that case. That's why we added this special case: if you want to madvise(DONTNEED) an anonymous page, you should have write permissions to the page.
Hmm. I actually would be happier if we just made that change in general. Maybe even without sealing, but I agree that it *definitely* makes sense in general as a sealing thing.
IOW, just saying
"madvise(DONTNEED) needs write permissions to an anonymous mapping when sealed"
makes 100% sense to me. Having a separate _flag_ to give sensible semantics is just odd.
IOW, what I really want is exactly that "sensible semantics, not random flags".
Particularly for new system calls with fairly specialized use, I think it's very important that the semantics are sensible on a conceptual level, and that we do not add system calls that are based on "random implementation issue of the day".
Yes, yes, then as we have to maintain things long-term, and we hit some compatibility issue, at *THAT* point we'll end up facing nasty "we had an implementation that had these semantics in practice, so now we're stuck with it", but when introducing a new system call, let's try really hard to start off from those kinds of random things.
Wouldn't it be lovely if we can just come up with a sane set of "this is what it means to seal a vma", and enumerate those, and make those sane conceptual rules be the initial definition. By all means have a "flags" argument for future cases when we figure out there was something wrong or the notion needed to be extended, but if we already *start* with random extensions, I feel there's something wrong with the whole concept.
So I would really wish for the first version of
mseal(start, len, flags);
to have "flags=0" be the one and only case we actually handle initially, and only add a single PROT_SEAL flag to mmap() that says "create this mapping already pre-sealed".
Strive very hard to make sealing be a single VM_SEALED flag in the vma->vm_flags that we already have, just admit that none of this matters on 32-bit architectures, so that VM_SEALED can just use one of the high flags that we have several free of (and that pkeys already depends on), and make this a standard feature with no #ifdef's.
Can chrome live with that? And what would the required semantics be? I'll start the list:
- you can't unmap or remap in any way (including over-mapping)
- you can't change protections (but with architecture support like pkey, you can obviously change the protections indirectly with PKRU etc)
- you can't do VM operations that change data without the area being writable (so the DONTNEED case - maybe there are others)
- anything else?
Wouldn't it be lovely to have just a single notion of sealing that is well-documented and makes sense, and doesn't require people to worry about odd special cases?
And yes, we'd have the 'flags' argument for future special cases, and hope really hard that it's never needed.
Linus