On Thu, May 22, 2025 at 03:13:31PM +0800, Muchun Song wrote:
On May 22, 2025, at 13:34, Oscar Salvador osalvador@suse.de wrote:
On Thu, May 22, 2025 at 11:47:05AM +0800, Muchun Song wrote:
Thanks for fixing this problem. BTW, in order to catch future similar problem, it is better to add WARN_ON into folio_hstate() to assert if hugetlb_lock is not held when folio's reference count is zero. For this fix, LGTM.
Why cannot we put all the burden in alloc_and_dissolve_hugetlb_folio(), which will again check things under the lock?
I've also considered about this choice, because there is another similar case in isolate_or_dissolve_huge_page() which could benefit from this change. I am fine with both approaches. Anyway, adding an assertion into folio_hstate() is an improvement for capturing invalid users in the future. Because any user of folio_hstate() should hold a reference to folio or hold the hugetlb_lock to make sure it returns a valid hstate for a hugetlb folio.
Yes, I am not arguing about that, it makes perfect sense to me, but I am just kinda against these micro-optimizations for taking the lock to check things when we are calling in a function that will again the lock to check things.
Actually, I think that the folio_test_hugetlb() check in replace_free_hugepage_folios() was put there to try tro be smart and save cycles in case we were not dealing with a hugetlb page (so freed under us). Now that we learnt that we cannot do that without 1) taking a refcount 2) or holding the lock, that becomes superfluos, so I would just wipe that out.