On Mon, May 15, 2023 at 09:20:44AM +0800, xiubli@redhat.com wrote:
From: Xiubo Li xiubli@redhat.com
Blindly expanding the readahead windows will cause unneccessary pagecache thrashing and also will introdue the network workload.
s/introdue/introduce/
We should disable expanding the windows if the readahead is disabled and also shouldn't expand the windows too much.
Expanding forward firstly instead of expanding backward for possible sequential reads.
Bound `rreq->len` to the actual file size to restore the previous page cache usage.
The posix_fadvise may change the maximum size of a file readahead.
Cc: stable@vger.kernel.org Fixes: 49870056005c ("ceph: convert ceph_readpages to ceph_readahead") URL: https://lore.kernel.org/ceph-devel/20230504082510.247-1-sehuww@mail.scut.edu... URL: https://www.spinics.net/lists/ceph-users/msg76183.html Cc: Hu Weiwen sehuww@mail.scut.edu.cn Signed-off-by: Xiubo Li xiubli@redhat.com
fs/ceph/addr.c | 40 +++++++++++++++++++++++++++++++++------- 1 file changed, 33 insertions(+), 7 deletions(-)
diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c index 93fff1a7373f..4b29777c01d7 100644 --- a/fs/ceph/addr.c +++ b/fs/ceph/addr.c @@ -188,16 +188,42 @@ static void ceph_netfs_expand_readahead(struct netfs_io_request *rreq) struct inode *inode = rreq->inode; struct ceph_inode_info *ci = ceph_inode(inode); struct ceph_file_layout *lo = &ci->i_layout;
- unsigned long max_pages = inode->i_sb->s_bdi->ra_pages;
- loff_t end = rreq->start + rreq->len, new_end;
- struct ceph_netfs_request_data *priv = rreq->netfs_priv;
- unsigned long max_len; u32 blockoff;
- u64 blockno;
- /* Expand the start downward */
- blockno = div_u64_rem(rreq->start, lo->stripe_unit, &blockoff);
- rreq->start = blockno * lo->stripe_unit;
- rreq->len += blockoff;
- if (priv) {
/* Readahead is disabled by posix_fadvise POSIX_FADV_RANDOM */
if (priv->file_ra_disabled)
max_pages = 0;
else
max_pages = priv->file_ra_pages;
- }
- /* Readahead is disabled */
- if (!max_pages)
return;
- /* Now, round up the length to the next block */
- rreq->len = roundup(rreq->len, lo->stripe_unit);
- max_len = max_pages << PAGE_SHIFT;
- /*
* Try to expand the length forward by rounding up it to the next
An extra space between "rounding up".
Apart from above two typo, LGTM.
Reviewed-by: Hu Weiwen sehuww@mail.scut.edu.cn
I also tested this patch with our workload. Reading the first 16k images from ImageNet dataset (1.69GiB) takes about 1.8Gi page cache (as reported by `free -h'). This is expected.
For the fadvise use-case, I use `fio' to do the test: $ fio --name=rand --size=32M --fadvise_hint=1 --ioengine=libaio --iodepth=128 --rw=randread --bs=4k --filesize=2G
after the test, page cache increased by about 35Mi, which is expected. So if appropriate:
Tested-by: Hu Weiwen sehuww@mail.scut.edu.cn
However, also note random reading to a large file without fadvise still suffers from degradation. e.g., this test: $ fio --name=rand --size=32M --fadvise_hint=0 --ioengine=libaio --iodepth=128 --rw=randread --bs=4k --filesize=2G
will load nearly every page of the 2Gi test file into page cache, although I only need 32Mi of them.
* block, but do not exceed the file size, unless the original
* request already exceeds it.
*/
- new_end = min(round_up(end, lo->stripe_unit), rreq->i_size);
- if (new_end > end && new_end <= rreq->start + max_len)
rreq->len = new_end - rreq->start;
- /* Try to expand the start downward */
- div_u64_rem(rreq->start, lo->stripe_unit, &blockoff);
- if (rreq->len + blockoff <= max_len) {
rreq->start -= blockoff;
rreq->len += blockoff;
- }
} static bool ceph_netfs_clamp_length(struct netfs_io_subrequest *subreq) -- 2.40.1