On Fri, Dec 8, 2023 at 1:24 AM Kairui Song ryncsn@gmail.com wrote:
Yu Zhao yuzhao@google.com 于2023年12月8日周五 14:14写道:
Unmapped folios accessed through file descriptors can be underprotected. Those folios are added to the oldest generation based on:
- The fact that they are less costly to reclaim (no need to walk the rmap and flush the TLB) and have less impact on performance (don't cause major PFs and can be non-blocking if needed again).
- The observation that they are likely to be single-use. E.g., for client use cases like Android, its apps parse configuration files and store the data in heap (anon); for server use cases like MySQL, it reads from InnoDB files and holds the cached data for tables in buffer pools (anon).
However, the oldest generation can be very short lived, and if so, it doesn't provide the PID controller with enough time to respond to a surge of refaults. (Note that the PID controller uses weighted refaults and those from evicted generations only take a half of the whole weight.) In other words, for a short lived generation, the moving average smooths out the spike quickly.
To fix the problem:
- For folios that are already on LRU, if they can be beyond the tracking range of tiers, i.e., five accesses through file descriptors, move them to the second oldest generation to give them more time to age. (Note that tiers are used by the PID controller to statistically determine whether folios accessed multiple times through file descriptors are worth protecting.)
- When adding unmapped folios to LRU, adjust the placement of them so that they are not too close to the tail. The effect of this is similar to the above.
On Android, launching 55 apps sequentially: Before After Change workingset_refault_anon 25641024 25598972 0% workingset_refault_file 115016834 106178438 -8%
Hi Yu,
Thanks you for your amazing works on MGLRU.
I believe this is the similar issue I was trying to resolve previously: https://lwn.net/Articles/945266/ The idea is to use refault distance to decide if the page should be place in oldest generation or some other gen, which per my test, worked very well, and we have been using refault distance for MGLRU in multiple workloads.
There are a few issues left in my previous RFC series, like anon pages in MGLRU shouldn't be considered, I wanted to collect feedback or test cases, but unfortunately it seems didn't get too much attention upstream.
I think both this patch and my previous series are for solving the file pages underpertected issue, and I did a quick test using this series, for mongodb test, refault distance seems still a better solution (I'm not saying these two optimization are mutually exclusive though, just they do have some conflicts in implementation and solving similar problem):
Previous result:
Execution Results after 905 seconds
Executed Time (µs) Rate
STOCK_LEVEL 2542 27121571486.2 0.09 txn/s
TOTAL 2542 27121571486.2 0.09 txn/s
This patch:
Execution Results after 900 seconds
Executed Time (µs) Rate
STOCK_LEVEL 1594 27061522574.4 0.06 txn/s
TOTAL 1594 27061522574.4 0.06 txn/s
Unpatched version is always around ~500.
Thanks for the test results!
I think there are a few points here:
- Refault distance make use of page shadow so it can better
distinguish evicted pages of different access pattern (re-access distance).
- Throttled refault distance can help hold part of workingset when
memory is too small to hold the whole workingset.
So maybe part of this patch and the bits of previous series can be combined to work better on this issue, how do you think?
I'll try to find some time this week to look at your RFC. It'd be a lot easier for me if you could share 1. your latest tree, preferably based on the mainline, and 2. your VM image containing the above test.