On 2024/4/8 4:31, Oscar Salvador wrote:
Totally unexpected, as this commit even removed hwpoison_entry_to_pfn(). Obviously even until now I assumed hwpoison is accounted as pfn swap entry but it's just missing..
Since this commit didn't really change is_pfn_swap_entry() itself, I was thinking maybe an older fix tag would apply, but then I noticed the old code indeed should work well even if hwpoison entry is missing. For example, it's a grey area on whether a hwpoisoned page should be accounted in smaps. So I think the Fixes tag is correct, and thanks for fixing this.
Reviewed-by: Peter Xu peterx@redhat.com
Thanks Peter
Thanks both.
Fedora stopped having DEBUG_VM for some time, but not sure about when it's still in the 6.1 trees. It looks like cc stable is still reasonable from that regard.
Good to know, thanks for the info.
A side note is that when I'm looking at this, I went back and see why in some cases we need the pfn maintained for the poisoned, then I saw the only user is check_hwpoisoned_entry() who wants to do fast kills in some contexts and that includes a double check on the pfns in a poisoned entry. Then afaict this path is just too rarely used and buggy.
Yes, unfortunately memory-failure code does not get exercised that much, and so there might be subtly bugs lurking in there for quite some time.
There're many memory-failure testcases but some code paths still didn't get exercised. That's a pity. :(
A few things we may need fixing, maybe someone in the loop would have time to have a look:
- check_hwpoisoned_entry()
- pte_none check is missing
- all the rest swap types are missing (e.g., we want to kill the proc too if the page is during migration)
Firstly, I thought rest swap types just won't exist in this code path. But after second thought, it seems it's possible. For example, when page is being isolated for migration, memory_failure will fails to isolate it. And the second MCE event will goes to kill_accessing_process() and see a migrate swap entry.
- check_hwpoisoned_pmd_entry()
- need similar care like above (pmd_none is covered not others)
I will have a look and see what needs fixing, thanks for bringing it up.
Thanks for your time. .