On 17.12.21 21:47, Jason Gunthorpe wrote:
On Fri, Dec 17, 2021 at 12:36:43PM -0800, Linus Torvalds wrote:
- Take a R/O pin (RDMA, VFIO, ...)
-> refcount > 1
- memset(mem, 0xff, pagesize);
-> Write fault -> COW
I do not believe this is actually a bug.
You asked for a R/O pin, and you got one.
Then somebody else modified that page, and you got exactly what you asked for - a COW event. The original R/O pin has the original page that it asked for, and can read it just fine.
Hi Jason
To remind all, the GUP users, like RDMA, VFIO use FOLL_FORCE|FOLL_WRITE to get a 'r/o pin' specifically because of the
I heard that statement often. Can you point me at the code?
VFIO: drivers/vfio/vfio_iommu_type1.c
vaddr_get_pfns() will end up doing a pin_user_pages_remote(FOLL_LONGTERM) without FOLL_FORCE|FOLL_WRITE.
Is that added automatically internally?
Note the comment in the next patch
+ * + * TODO: although the security issue described does no longer apply in any case, + * the full consistency between the pinned pages and the pages mapped into the + * page tables of the MM only apply to short-term pinnings only. For + * FOLL_LONGTERM, FOLL_WRITE|FOLL_FORCE is required for now, which can be + * inefficient and still result in some consistency issues. Extend this + * mechanism to also provide full synchronicity to FOLL_LONGTERM, avoiding + * FOLL_WRITE|FOLL_FORCE.
Eg in RDMA we know of apps asking for a R/O pin of something in .bss then filling that something with data finally doing the actual DMA. Breaking COW after pin breaks those apps.
The above #5 can occur for O_DIRECT read and in that case the 'snapshot the data' is perfectly fine as racing the COW with the O_DIRECT read just resolves the race toward the read() direction.
IIRC there is some other scenario that motivated this patch?
1. I want to fix the COW security issue as documented. Reproducers in patch #11
2. I want to fix all of the other issues as documented and linked in the cover letter that result from the imprecise page_count check in COW code. Especially the ones where we have memory corruptions, because this is just not acceptable. There are reproducers as well for everybody that doesn't believe me.
But this series really just wants to fix the security issue as "part 1". Without any more breakages.
I'm sorry, but it's all described in the cover letter. Maybe TL;DR