On Mon, Nov 11, 2024 at 6:33 PM David Hildenbrand david@redhat.com wrote:
On 08.11.24 15:17, Yafang Shao wrote:
When testing large folio support with XFS on our servers, we observed that only a few large folios are mapped when reading large files via mmap. After a thorough analysis, I identified it was caused by the `/sys/block/*/queue/read_ahead_kb` setting. On our test servers, this parameter is set to 128KB. After I tune it to 2MB, the large folio can work as expected. However, I believe the large folio behavior should not be dependent on the value of read_ahead_kb. It would be more robust if the kernel can automatically adopt to it.
Now I am extremely confused.
Documentation/ABI/stable/sysfs-block:
"[RW] Maximum number of kilobytes to read-ahead for filesystems on this block device."
So, with your patch, will we also be changing the readahead size to exceed that, or simply allocate larger folios and not exceeding the readahead size (e.g., leaving them partially non-filled)?
Exceeding the readahead size for the MADV_HUGEPAGE case is straightforward; this is what the current patch accomplishes.
If you're also changing the readahead behavior to exceed the configuration parameter it would sound to me like "I am pushing the brake pedal and my care brakes; fix the brakes to adopt whether to brake automatically" :)
Likely I am missing something here, and how the read_ahead_kb parameter is used after your patch.
The read_ahead_kb parameter continues to function for non-MADV_HUGEPAGE scenarios, whereas special handling is required for the MADV_HUGEPAGE case. It appears that we ought to update the Documentation/ABI/stable/sysfs-block to reflect the changes related to large folios, correct?
With /sys/block/*/queue/read_ahead_kb set to 128KB and performing a sequential read on a 1GB file using MADV_HUGEPAGE, the differences in /proc/meminfo are as follows:
before this patch FileHugePages: 18432 kB FilePmdMapped: 4096 kB
after this patch FileHugePages: 1067008 kB FilePmdMapped: 1048576 kB
This shows that after applying the patch, the entire 1GB file is mapped to huge pages. The stable list is CCed, as without this patch, large folios don’t function optimally in the readahead path.
It's worth noting that if read_ahead_kb is set to a larger value
that isn't
aligned with huge page sizes (e.g., 4MB + 128KB), it may still fail to map to hugepages.
Fixes: 4687fdbb805a ("mm/filemap: Support VM_HUGEPAGE for file mappings") Suggested-by: Matthew Wilcox willy@infradead.org Signed-off-by: Yafang Shao laoar.shao@gmail.com Cc: stable@vger.kernel.org
mm/readahead.c | 2 ++ 1 file changed, 2 insertions(+)
Changes: v1->v2:
- Drop the align (Matthew)
- Improve commit log (Andrew)
RFC->v1: https://lore.kernel.org/linux-mm/20241106092114.8408-1-laoar.shao@gmail.com/
- Simplify the code as suggested by Matthew
RFC: https://lore.kernel.org/linux-mm/20241104143015.34684-1-laoar.shao@gmail.com...
diff --git a/mm/readahead.c b/mm/readahead.c index 3dc6c7a128dd..9b8a48e736c6 100644 --- a/mm/readahead.c +++ b/mm/readahead.c @@ -385,6 +385,8 @@ static unsigned long get_next_ra_size(struct file_ra_state *ra, return 4 * cur; if (cur <= max / 2) return 2 * cur;
if (cur > max)
return cur; return max;
Maybe something like
return max_t(unsigned long, cur, max);
might be more readable (likely "max()" cannot be used because of the local variable name "max" ...).
... but it's rather weird having a "max" and then returning something larger than the "max" ... especially with code like
Indeed, that could lead to confusion ;)
"ra->size = get_next_ra_size(ra, max_pages);"
Maybe we can improve that by renaming "max_pages" / "max" to what it actually is supposed to be (which I haven't quite understood yet).
Perhaps a more straightforward solution would be to implement it directly at the callsite, as demonstrated below?
diff --git a/mm/readahead.c b/mm/readahead.c index 3dc6c7a128dd..187efae95b02 100644 --- a/mm/readahead.c +++ b/mm/readahead.c @@ -642,7 +642,11 @@ void page_cache_async_ra(struct readahead_control *ractl, 1UL << order); if (index == expected) { ra->start += ra->size; - ra->size = get_next_ra_size(ra, max_pages); + /* + * Allow the actual size to exceed the readahead window for a + * large folio. + */ + ra->size = get_next_ra_size(ra, max(max_pages, ra->size)); ra->async_size = ra->size; goto readit; }
-- Regards Yafang