On Mon, May 17, 2021 at 03:18:38PM -0700, Linus Torvalds wrote:
On Sun, May 16, 2021 at 7:42 PM Matthew Wilcox willy@infradead.org wrote:
Ah, if you just put one dummy word in front, then dma_addr_t overlaps with page->mapping, which used to be fine, but now we can map network queues to userspace, page->mapping has to be NULL.
Well, that's a problem in itself. We shouldn't have those kinds of "oh, it uses one field from one union, and another field from another".
No, I agree. I wasn't aware of that particular problem until recently, and adjusting the contents of struct page always makes me a little nervous, so I haven't done it yet.
At least that "low bit of the first word" thing is documented, but this kind of "oh, it uses dma_addr from one union and mapping from another" really needs to go away. Because that's just not acceptable.
So that network stack thing should just make the mapping thing explicit then.
One of the pending patches for next merge window has this:
struct { /* page_pool used by netstack */ - /** - * @dma_addr: might require a 64-bit value on - * 32-bit architectures. - */ + unsigned long pp_magic; + struct page_pool *pp; + unsigned long _pp_mapping_pad; unsigned long dma_addr[2]; };
(people are still bikeshedding the comments, etc, but fundamentally this is the new struct layout). It's not the networking stack that uses page->mapping, it's if somebody decides to do something like call get_user_pages() on a mapped page_pool page, which then calls page_mapping() ... if that doesn't return NULL, then things go wrong.
Andrey Ryabinin found the path: : Some implementation of the flush_dcache_page(), also set_page_dirty() : can be called on userspace-mapped vmalloc pages during unmap - : zap_pte_range() -> set_page_dirty()
Possibly by making mapping/index be the first fields (for both the page cache and the page_pool thing) and then have the LRU list and the dma_addr be after that?
Unfortunately, we also use the bottom two bits of ->mapping for PageAnon / PageKSM / PageMovable. We could move ->mapping to the fifth word of the union, where it's less in the way.
While I've got you on the subject of compound_head ... have you had a look at the folio work? It decreases the number of calls to compound_head() by about 25%, as well as shrinking the (compiled size) of the kernel and laying the groundwork for supporting things like 32kB anonymous pages and adaptive page sizes in the page cache. Andrew's a bit nervous of it, probably because it's such a large change.
I guess I need to take a look. Mind sending me another pointer to the series?
This is probably a good place to start: https://lore.kernel.org/lkml/20210505150628.111735-10-willy@infradead.org/ or if you'd rather look at a git tree, https://git.infradead.org/users/willy/pagecache.git/shortlog/refs/heads/foli...