On Thu, Jul 11, 2019 at 05:04:55PM -0700, Andrew Morton wrote:
On Thu, 11 Jul 2019 14:58:38 +0200 Jan Kara jack@suse.cz wrote:
buffer_migrate_page_norefs() can race with bh users in a following way:
CPU1 CPU2 buffer_migrate_page_norefs() buffer_migrate_lock_buffers() checks bh refs spin_unlock(&mapping->private_lock) __find_get_block() spin_lock(&mapping->private_lock) grab bh ref spin_unlock(&mapping->private_lock) move page do bh work
This can result in various issues like lost updates to buffers (i.e. metadata corruption) or use after free issues for the old page.
Closing this race window is relatively difficult. We could hold mapping->private_lock in buffer_migrate_page_norefs() until we are finished with migrating the page but the lock hold times would be rather big. So let's revert to a more careful variant of page migration requiring eviction of buffers on migrated page. This is effectively fallback_migrate_page() that additionally invalidates bh LRUs in case try_to_free_buffers() failed.
Is this premature optimization? Holding ->private_lock while messing with the buffers would be the standard way of addressing this. The longer hold times *might* be an issue, but we don't know this, do we? If there are indeed such problems then they could be improved by, say, doing more of the newpage preparation prior to taking ->private_lock.
To some extent, we do not know how much of a problem this patch will be either or what impact avoiding dirty block pages during migration is either. So both approaches have their downsides.
However, failing a high-order allocation is typically benign and it is an inevitable problem that depends on the workload. I don't think we could ever hit a case whereby there was enough spinning to cause a soft lockup but on the other hand, I don't think there is much scope for doing more of the preparation steps before acquiring private_lock either.