From: Xiubo Li xiubli@redhat.com
Blindly expanding the readahead windows will cause unneccessary pagecache thrashing and also will introdue the network workload. We should disable expanding the windows if the readahead is disabled and also shouldn't expand the windows too much.
Expanding forward firstly instead of expanding backward for possible sequential reads.
Bound `rreq->len` to the actual file size to restore the previous page cache usage.
The posix_fadvise may change the maximum size of a file readahead.
Cc: stable@vger.kernel.org Fixes: 49870056005c ("ceph: convert ceph_readpages to ceph_readahead") URL: https://lore.kernel.org/ceph-devel/20230504082510.247-1-sehuww@mail.scut.edu... URL: https://www.spinics.net/lists/ceph-users/msg76183.html Cc: Hu Weiwen sehuww@mail.scut.edu.cn Signed-off-by: Xiubo Li xiubli@redhat.com --- fs/ceph/addr.c | 40 +++++++++++++++++++++++++++++++++------- 1 file changed, 33 insertions(+), 7 deletions(-)
diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c index 93fff1a7373f..4b29777c01d7 100644 --- a/fs/ceph/addr.c +++ b/fs/ceph/addr.c @@ -188,16 +188,42 @@ static void ceph_netfs_expand_readahead(struct netfs_io_request *rreq) struct inode *inode = rreq->inode; struct ceph_inode_info *ci = ceph_inode(inode); struct ceph_file_layout *lo = &ci->i_layout; + unsigned long max_pages = inode->i_sb->s_bdi->ra_pages; + loff_t end = rreq->start + rreq->len, new_end; + struct ceph_netfs_request_data *priv = rreq->netfs_priv; + unsigned long max_len; u32 blockoff; - u64 blockno;
- /* Expand the start downward */ - blockno = div_u64_rem(rreq->start, lo->stripe_unit, &blockoff); - rreq->start = blockno * lo->stripe_unit; - rreq->len += blockoff; + if (priv) { + /* Readahead is disabled by posix_fadvise POSIX_FADV_RANDOM */ + if (priv->file_ra_disabled) + max_pages = 0; + else + max_pages = priv->file_ra_pages; + + } + + /* Readahead is disabled */ + if (!max_pages) + return;
- /* Now, round up the length to the next block */ - rreq->len = roundup(rreq->len, lo->stripe_unit); + max_len = max_pages << PAGE_SHIFT; + + /* + * Try to expand the length forward by rounding up it to the next + * block, but do not exceed the file size, unless the original + * request already exceeds it. + */ + new_end = min(round_up(end, lo->stripe_unit), rreq->i_size); + if (new_end > end && new_end <= rreq->start + max_len) + rreq->len = new_end - rreq->start; + + /* Try to expand the start downward */ + div_u64_rem(rreq->start, lo->stripe_unit, &blockoff); + if (rreq->len + blockoff <= max_len) { + rreq->start -= blockoff; + rreq->len += blockoff; + } }
static bool ceph_netfs_clamp_length(struct netfs_io_subrequest *subreq)
On Mon, May 15, 2023 at 09:20:44AM +0800, xiubli@redhat.com wrote:
From: Xiubo Li xiubli@redhat.com
Blindly expanding the readahead windows will cause unneccessary pagecache thrashing and also will introdue the network workload.
s/introdue/introduce/
We should disable expanding the windows if the readahead is disabled and also shouldn't expand the windows too much.
Expanding forward firstly instead of expanding backward for possible sequential reads.
Bound `rreq->len` to the actual file size to restore the previous page cache usage.
The posix_fadvise may change the maximum size of a file readahead.
Cc: stable@vger.kernel.org Fixes: 49870056005c ("ceph: convert ceph_readpages to ceph_readahead") URL: https://lore.kernel.org/ceph-devel/20230504082510.247-1-sehuww@mail.scut.edu... URL: https://www.spinics.net/lists/ceph-users/msg76183.html Cc: Hu Weiwen sehuww@mail.scut.edu.cn Signed-off-by: Xiubo Li xiubli@redhat.com
fs/ceph/addr.c | 40 +++++++++++++++++++++++++++++++++------- 1 file changed, 33 insertions(+), 7 deletions(-)
diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c index 93fff1a7373f..4b29777c01d7 100644 --- a/fs/ceph/addr.c +++ b/fs/ceph/addr.c @@ -188,16 +188,42 @@ static void ceph_netfs_expand_readahead(struct netfs_io_request *rreq) struct inode *inode = rreq->inode; struct ceph_inode_info *ci = ceph_inode(inode); struct ceph_file_layout *lo = &ci->i_layout;
- unsigned long max_pages = inode->i_sb->s_bdi->ra_pages;
- loff_t end = rreq->start + rreq->len, new_end;
- struct ceph_netfs_request_data *priv = rreq->netfs_priv;
- unsigned long max_len; u32 blockoff;
- u64 blockno;
- /* Expand the start downward */
- blockno = div_u64_rem(rreq->start, lo->stripe_unit, &blockoff);
- rreq->start = blockno * lo->stripe_unit;
- rreq->len += blockoff;
- if (priv) {
/* Readahead is disabled by posix_fadvise POSIX_FADV_RANDOM */
if (priv->file_ra_disabled)
max_pages = 0;
else
max_pages = priv->file_ra_pages;
- }
- /* Readahead is disabled */
- if (!max_pages)
return;
- /* Now, round up the length to the next block */
- rreq->len = roundup(rreq->len, lo->stripe_unit);
- max_len = max_pages << PAGE_SHIFT;
- /*
* Try to expand the length forward by rounding up it to the next
An extra space between "rounding up".
Apart from above two typo, LGTM.
Reviewed-by: Hu Weiwen sehuww@mail.scut.edu.cn
I also tested this patch with our workload. Reading the first 16k images from ImageNet dataset (1.69GiB) takes about 1.8Gi page cache (as reported by `free -h'). This is expected.
For the fadvise use-case, I use `fio' to do the test: $ fio --name=rand --size=32M --fadvise_hint=1 --ioengine=libaio --iodepth=128 --rw=randread --bs=4k --filesize=2G
after the test, page cache increased by about 35Mi, which is expected. So if appropriate:
Tested-by: Hu Weiwen sehuww@mail.scut.edu.cn
However, also note random reading to a large file without fadvise still suffers from degradation. e.g., this test: $ fio --name=rand --size=32M --fadvise_hint=0 --ioengine=libaio --iodepth=128 --rw=randread --bs=4k --filesize=2G
will load nearly every page of the 2Gi test file into page cache, although I only need 32Mi of them.
* block, but do not exceed the file size, unless the original
* request already exceeds it.
*/
- new_end = min(round_up(end, lo->stripe_unit), rreq->i_size);
- if (new_end > end && new_end <= rreq->start + max_len)
rreq->len = new_end - rreq->start;
- /* Try to expand the start downward */
- div_u64_rem(rreq->start, lo->stripe_unit, &blockoff);
- if (rreq->len + blockoff <= max_len) {
rreq->start -= blockoff;
rreq->len += blockoff;
- }
} static bool ceph_netfs_clamp_length(struct netfs_io_subrequest *subreq) -- 2.40.1
On 6/5/23 14:56, 胡玮文 wrote:
On Mon, May 15, 2023 at 09:20:44AM +0800, xiubli@redhat.com wrote:
From: Xiubo Li xiubli@redhat.com
Blindly expanding the readahead windows will cause unneccessary pagecache thrashing and also will introdue the network workload.
s/introdue/introduce/
Will fix it.
We should disable expanding the windows if the readahead is disabled and also shouldn't expand the windows too much.
Expanding forward firstly instead of expanding backward for possible sequential reads.
Bound `rreq->len` to the actual file size to restore the previous page cache usage.
The posix_fadvise may change the maximum size of a file readahead.
Cc: stable@vger.kernel.org Fixes: 49870056005c ("ceph: convert ceph_readpages to ceph_readahead") URL: https://lore.kernel.org/ceph-devel/20230504082510.247-1-sehuww@mail.scut.edu... URL: https://www.spinics.net/lists/ceph-users/msg76183.html Cc: Hu Weiwen sehuww@mail.scut.edu.cn Signed-off-by: Xiubo Li xiubli@redhat.com
fs/ceph/addr.c | 40 +++++++++++++++++++++++++++++++++------- 1 file changed, 33 insertions(+), 7 deletions(-)
diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c index 93fff1a7373f..4b29777c01d7 100644 --- a/fs/ceph/addr.c +++ b/fs/ceph/addr.c @@ -188,16 +188,42 @@ static void ceph_netfs_expand_readahead(struct netfs_io_request *rreq) struct inode *inode = rreq->inode; struct ceph_inode_info *ci = ceph_inode(inode); struct ceph_file_layout *lo = &ci->i_layout;
- unsigned long max_pages = inode->i_sb->s_bdi->ra_pages;
- loff_t end = rreq->start + rreq->len, new_end;
- struct ceph_netfs_request_data *priv = rreq->netfs_priv;
- unsigned long max_len; u32 blockoff;
- u64 blockno;
- /* Expand the start downward */
- blockno = div_u64_rem(rreq->start, lo->stripe_unit, &blockoff);
- rreq->start = blockno * lo->stripe_unit;
- rreq->len += blockoff;
- if (priv) {
/* Readahead is disabled by posix_fadvise POSIX_FADV_RANDOM */
if (priv->file_ra_disabled)
max_pages = 0;
else
max_pages = priv->file_ra_pages;
- }
- /* Readahead is disabled */
- if (!max_pages)
return;
- /* Now, round up the length to the next block */
- rreq->len = roundup(rreq->len, lo->stripe_unit);
- max_len = max_pages << PAGE_SHIFT;
- /*
* Try to expand the length forward by rounding up it to the next
An extra space between "rounding up".
Will fix it.
Apart from above two typo, LGTM.
Reviewed-by: Hu Weiwen sehuww@mail.scut.edu.cn
I also tested this patch with our workload. Reading the first 16k images from ImageNet dataset (1.69GiB) takes about 1.8Gi page cache (as reported by `free -h'). This is expected.
For the fadvise use-case, I use `fio' to do the test: $ fio --name=rand --size=32M --fadvise_hint=1 --ioengine=libaio --iodepth=128 --rw=randread --bs=4k --filesize=2G
after the test, page cache increased by about 35Mi, which is expected. So if appropriate:
Tested-by: Hu Weiwen sehuww@mail.scut.edu.cn
Thanks for your tests and reviewing.
However, also note random reading to a large file without fadvise still suffers from degradation. e.g., this test: $ fio --name=rand --size=32M --fadvise_hint=0 --ioengine=libaio --iodepth=128 --rw=randread --bs=4k --filesize=2G
will load nearly every page of the 2Gi test file into page cache, although I only need 32Mi of them.
This is another issue since this patch just to fix blindly expanding readahead windows, please send one following patch to fix it.
Thanks
- Xiubo
* block, but do not exceed the file size, unless the original
* request already exceeds it.
*/
- new_end = min(round_up(end, lo->stripe_unit), rreq->i_size);
- if (new_end > end && new_end <= rreq->start + max_len)
rreq->len = new_end - rreq->start;
- /* Try to expand the start downward */
- div_u64_rem(rreq->start, lo->stripe_unit, &blockoff);
- if (rreq->len + blockoff <= max_len) {
rreq->start -= blockoff;
rreq->len += blockoff;
- } }
static bool ceph_netfs_clamp_length(struct netfs_io_subrequest *subreq) -- 2.40.1
linux-stable-mirror@lists.linaro.org