On Tue, Nov 19, 2024 at 2:30 AM Pasha Tatashin pasha.tatashin@soleen.com wrote:
Can you point me to where a refcounted reference to the page comes from when page_detective_metadata() calls dump_page_lvl()?
I am sorry, I remembered incorrectly, we are getting reference right after dump_page_lvl() in page_detective_memcg() -> folio_try_get(); I will move the folio_try_get() to before dump_page_lvl().
So I think dump_page() in its current form is not something we should expose to a userspace-reachable API.
We use dump_page() all over WARN_ONs in MM code where pages might not be locked, but this is a good point, that while even the existing usage might be racy, providing a user-reachable API potentially makes it worse. I will see if I could add some locking before dump_page(), or make a dump_page variant that does not do dump_mapping().
To be clear, I am not that strongly opposed to racily reading data such that the data may not be internally consistent or such; but this is a case of racy use-after-free reads that might end up dumping entirely unrelated memory contents into dmesg. I think we should properly protect against that in an API that userspace can invoke. Otherwise, if we race, we might end up writing random memory contents into dmesg; and if we are particularly unlucky, those random memory contents could be PII or authentication tokens or such.
I'm not entirely sure what the right approach is here; I guess it makes sense that when the kernel internally detects corruption, dump_page doesn't take references on pages it accesses to avoid corrupting things further. If you are looking at a page based on a userspace request, I guess you could access the page with the necessary locking to access its properties under the normal locking rules?
I will take reference, as we already do that for memcg purpose, but have not included dump_page().
Note that taking a reference on the page does not make all of dump_page() fine; in particular, my understanding is that folio_mapping() requires that the page is locked in order to return a stable pointer, and some of the code in dump_mapping() would probably also require some other locks - probably at least on the inode and maybe also on the dentry, I think? Otherwise the inode's dentry list can probably change concurrently, and the dentry's name pointer can change too.