On 12/21/18 2:28 AM, Kirill A. Shutemov wrote:
On Tue, Dec 18, 2018 at 02:35:57PM -0800, Mike Kravetz wrote:
Instead of writing the required complicated code for this rare occurrence, just eliminate the race. i_mmap_rwsem is now held in read mode for the duration of page fault processing. Hold i_mmap_rwsem longer in truncation and hold punch code to cover the call to remove_inode_hugepages.
One of remove_inode_hugepages() callers is noticeably missing -- hugetlbfs_evict_inode(). Why?
It at least deserves a comment on why the lock rule doesn't apply to it.
In the case of hugetlbfs_evict_inode, the vfs layer guarantees there are no more users of the inode/file. Therefore, it is safe to call without holding the mutex. But, I did add this comment to remove_inode_hugepages.
* Callers of this routine must hold the i_mmap_rwsem in write mode to prevent * races with page faults.
So, I violated the rule that I documented. Thanks for catching.
I will update the comments to note this excpetion to the rule. Another option is to simply take the semaphore and still note why it is technically not needed. Since there are no users there will be no contention of the semaphore and the overhead should be negligible.