On Tue, Nov 19, 2024 at 01:52:00PM +0100, Jann Horn wrote:
I will take reference, as we already do that for memcg purpose, but have not included dump_page().
Note that taking a reference on the page does not make all of dump_page() fine; in particular, my understanding is that folio_mapping() requires that the page is locked in order to return a stable pointer, and some of the code in dump_mapping() would probably also require some other locks - probably at least on the inode and maybe also on the dentry, I think? Otherwise the inode's dentry list can probably change concurrently, and the dentry's name pointer can change too.
First important thing is that we snapshot the page. So while we may have a torn snapshot of the page, it can't change under us any more, so we don't have to worry about it being swizzled one way and then swizzled back.
Second thing is that I think using folio_mapping() is actually wrong. We don't want the swap mapping if it's an anon page that's in the swapcache. We'd be fine just doing mapping = folio->mapping (we'd need to add a check for movable, but I think that's fine). Anyway, we know the folio isn't ksm or anon at the point that we call dump_mapping() because there's a chain of "else" statements. So I think we're fine because we can't switch between anon & file while holding a refcount.
Having a refcount on the folio will prevent the folio from being allocated to anything else again. It will not protect the mapping from being torn down (the folio can be truncated from the mapping, then the mapping can be freed, and the memory reused). As you say, the dentry can be renamed as well.
This patch series makes me nervous. I'd rather see it done as a bpf script or drgn script, but if it is going to be done in C, I'd really like to see more auditing of the safety here. It feels like the kind of hack that one deploys internally to debug a hard-to-hit condition, rather than the kind of code that we like to ship upstream.