On Wed 17-04-24 12:33:39, Zach O'Keefe wrote:
On Wed, Apr 17, 2024 at 4:10 AM Jan Kara jack@suse.cz wrote:
diff --git a/mm/page-writeback.c b/mm/page-writeback.c index cd4e4ae77c40a..02147b61712bc 100644 --- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -1638,7 +1638,7 @@ static inline void wb_dirty_limits(struct dirty_throttle_control *dtc) */ dtc->wb_thresh = __wb_calc_thresh(dtc); dtc->wb_bg_thresh = dtc->thresh ?
div_u64((u64)dtc->wb_thresh * dtc->bg_thresh, dtc->thresh) : 0;
div64_u64(dtc->wb_thresh * dtc->bg_thresh, dtc->thresh) : 0;
...
Thirdly, if thresholds are larger than 1<<32 pages, then dirty balancing is going to blow up in many other spectacular ways - consider only the multiplication on this line - it will not necessarily fit into u64 anymore. The whole dirty limiting code is interspersed with assumptions that limits are actually within u32 and we do our calculations in unsigned longs to avoid worrying about overflows (with occasional typing to u64 to make it more interesting because people expected those entities to overflow 32 bits even on 32-bit archs). Which is lame I agree but so far people don't seem to be setting limits to 16TB or more. And I'm not really worried about security here since this is global-root-only tunable and that has much better ways to DoS the system.
So overall I'm all for cleaning up this code but in a sensible way please. E.g. for these overflow issues at least do it one function at a time so that we can sensibly review it.
Andrew, can you please revert this patch until we have a better fix? So far it does more harm than good... Thanks!
Shall we just roll-forward with a suitable fix? I think all the original code actually "needed" was to cast the ternary predicate, like:
---8<--- diff --git a/mm/page-writeback.c b/mm/page-writeback.c index fba324e1a010..ca1bfc0c9bdd 100644 --- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -1637,8 +1637,8 @@ static inline void wb_dirty_limits(struct dirty_throttle_control *dtc) * at some rate <= (write_bw / 2) for bringing down wb_dirty. */ dtc->wb_thresh = __wb_calc_thresh(dtc);
dtc->wb_bg_thresh = dtc->thresh ?
div64_u64(dtc->wb_thresh * dtc->bg_thresh, dtc->thresh) : 0;
dtc->wb_bg_thresh = (u32)dtc->thresh ?
div_u64((u64)dtc->wb_thresh * dtc->bg_thresh, dtc->thresh) : 0;
Well, this would fix the division by 0 but when you read the code you really start wondering what's going on :) And as I wrote above when thresholds pass UINT_MAX, the dirty limitting code breaks down anyway so I don't think the machine will be more usable after your fix. Would you be up for a challenge to modify mm/page-writeback.c so that such huge limits cannot be set instead? That would be actually a useful fix...
Honza
/* * In order to avoid the stacked BDI deadlock we need
---8<---
Thanks, and apologize for the inconvenience
Zach
Honza
-- Jan Kara jack@suse.com SUSE Labs, CR