On Tue, Nov 19, 2024 at 7:52 AM Jann Horn jannh@google.com wrote:
On Tue, Nov 19, 2024 at 2:30 AM Pasha Tatashin pasha.tatashin@soleen.com wrote:
Can you point me to where a refcounted reference to the page comes from when page_detective_metadata() calls dump_page_lvl()?
I am sorry, I remembered incorrectly, we are getting reference right after dump_page_lvl() in page_detective_memcg() -> folio_try_get(); I will move the folio_try_get() to before dump_page_lvl().
So I think dump_page() in its current form is not something we should expose to a userspace-reachable API.
We use dump_page() all over WARN_ONs in MM code where pages might not be locked, but this is a good point, that while even the existing usage might be racy, providing a user-reachable API potentially makes it worse. I will see if I could add some locking before dump_page(), or make a dump_page variant that does not do dump_mapping().
To be clear, I am not that strongly opposed to racily reading data such that the data may not be internally consistent or such; but this is a case of racy use-after-free reads that might end up dumping entirely unrelated memory contents into dmesg. I think we should properly protect against that in an API that userspace can invoke. Otherwise, if we race, we might end up writing random memory contents into dmesg; and if we are particularly unlucky, those random memory contents could be PII or authentication tokens or such.
I'm not entirely sure what the right approach is here; I guess it makes sense that when the kernel internally detects corruption, dump_page doesn't take references on pages it accesses to avoid corrupting things further. If you are looking at a page based on a userspace request, I guess you could access the page with the necessary locking to access its properties under the normal locking rules?
I will take reference, as we already do that for memcg purpose, but have not included dump_page().
Note that taking a reference on the page does not make all of dump_page() fine; in particular, my understanding is that folio_mapping() requires that the page is locked in order to return a stable pointer, and some of the code in dump_mapping() would probably also require some other locks - probably at least on the inode and maybe also on the dentry, I think? Otherwise the inode's dentry list can probably change concurrently, and the dentry's name pointer can change too.
Agreed, once reference is taken, the page identity cannot change (i.e. if it is a named page it will stay a named page), but dentry can be renamed. I will look into what can be done to guarantee consistency in the next version. There is also a fallback if locking cannot be reliably resolved (i.e. for performance reasons) where we can make dump_mapping() optionally disabled from dump_page_lvl() with a new argument flag.
Thank you, Pasha