On 6/21/19 4:07 PM, Bharath Vedartham wrote:
Do you think this could cause a race condition between __setup_per_zone_wmarks and pgdat_watermark_boosted which checks whether the watermark_boost of each zone is non-zero? pgdat_watermark_boosted is not called with a zone lock. Here is a probable case scenario: watermarks are boosted in steal_suitable_fallback(which happens under a zone lock). After that kswapd is woken up by wakeup_kswapd(zone,0,0,zone_idx(zone)) in rmqueue without holding a zone lock. Lets say someone modified min_kfree_bytes, this would lead to all the zone->watermark_boost being set to 0. This may cause pgdat_watermark_boosted to return false, which would not wakeup kswapd as intended by boosting the watermark. This behaviour is similar to waking up kswapd for a balanced node.
Not waking up kswapd shouldn't cause a significant trouble.
Also if kswapd was woken up successfully because of watermarks being boosted. In balance_pgdat, we use nr_boost_reclaim to count number of pages to reclaim because of boosting. nr_boost_reclaim is calculated as: nr_boost_reclaim = 0; for (i = 0; i <= classzone_idx; i++) { zone = pgdat->node_zones + i; if (!managed_zone(zone)) continue;
nr_boost_reclaim += zone->watermark_boost; zone_boosts[i] = zone->watermark_boost; } boosted = nr_boost_reclaim;
This is not under a zone_lock. This could lead to nr_boost_reclaim to be 0 if min_kfree_bytes is set to 0. Which would wake up kcompactd without reclaiming memory.
Setting min_kfree_bytes to 0 is asking for problems regardless of this check. Much more trouble than waking up kcompactd spuriously, which is just a few wasted cpu cycles.
kcompactd compaction might be spurious if the if the memory reclaim step is not happening?
Any thoughts?
Unless the races cause either some data corruption, or e.g. spurious allocation failures, I don't think they are worth adding new spinlock sections.
Thanks, Vlastimil
spin_unlock_irqrestore(&zone->lock, flags);