Michal Clapinski mclapinski@google.com writes:
This change introduces a way to check if an fd points to a memfd's original open fd (the one created by memfd_create).
We encountered an issue with migrating memfds in CRIU (checkpoint restore in userspace - it migrates running processes between machines). Imagine a scenario:
- Create a memfd. By default it's open with O_RDWR and yet one can
exec() to it (unlike with regular files, where one would get ETXTBSY). 2. Reopen that memfd with O_RDWR via /proc/self/fd/<fd>.
Now those 2 fds are indistinguishable from userspace. You can't exec() to either of them (since the reopen incremented inode->i_writecount) and their /proc/self/fdinfo/ are exactly the same. Unfortunately they are not the same. If you close the second one, the first one becomes exec()able again. If you close the first one, the other doesn't become exec()able. Therefore during migration it does matter which is recreated first and which is reopened but there is no way for CRIU to tell which was first.
So please bear with me...I'll confess that I don't fully understand the situation here, so this is probably a dumb question.
It seems like you are adding this "original open" test as a way of working around a quirk with the behavior of subsequent opens. I don't *think* that this is part of the intended, documented behavior of memfds, it's just something that happens. You're exposing an artifact of the current implementation.
Given that the two file descriptors are otherwise indistinguishable, might a better fix be to make them indistinguishable in this regard as well? Is there a good reason why the second fd doesn't become exec()able in this scenario and, if not, perhaps that behavior could be changed instead?
Thanks,
jon