In year 2007 high performance SSD was still expensive, in order to save more space for real workload or meta data, the readahead I/Os for non-meta data was bypassed and not cached on SSD.
In now days, SSD price drops a lot and people can find larger size SSD with more comfortable price. It is unncessary to bypass normal readahead I/Os to save SSD space for now.
This patch removes the code which checks REQ_RAHEAD tag of bio in check_should_bypass(), then all readahead I/Os will be cached on SSD.
NOTE: this patch still keeps the checking of "REQ_META|REQ_PRIO" in should_writeback(), because we still want to cache meta data I/Os even they are asynchronized.
Cc: stable@vger.kernel.org Signed-off-by: Coly Li colyli@suse.de Cc: Eric Wheeler bcache@linux.ewheeler.net --- drivers/md/bcache/request.c | 9 --------- 1 file changed, 9 deletions(-)
diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c index 73478a91a342..acc07c4f27ae 100644 --- a/drivers/md/bcache/request.c +++ b/drivers/md/bcache/request.c @@ -378,15 +378,6 @@ static bool check_should_bypass(struct cached_dev *dc, struct bio *bio) op_is_write(bio_op(bio)))) goto skip;
- /* - * Flag for bypass if the IO is for read-ahead or background, - * unless the read-ahead request is for metadata - * (eg, for gfs2 or xfs). - */ - if (bio->bi_opf & (REQ_RAHEAD|REQ_BACKGROUND) && - !(bio->bi_opf & (REQ_META|REQ_PRIO))) - goto skip; - if (bio->bi_iter.bi_sector & (c->sb.block_size - 1) || bio_sectors(bio) & (c->sb.block_size - 1)) { pr_debug("skipping unaligned io");
On Sat, 4 Jan 2020, Coly Li wrote:
In year 2007 high performance SSD was still expensive, in order to save more space for real workload or meta data, the readahead I/Os for non-meta data was bypassed and not cached on SSD.
In now days, SSD price drops a lot and people can find larger size SSD with more comfortable price. It is unncessary to bypass normal readahead I/Os to save SSD space for now.
This patch removes the code which checks REQ_RAHEAD tag of bio in check_should_bypass(), then all readahead I/Os will be cached on SSD.
NOTE: this patch still keeps the checking of "REQ_META|REQ_PRIO" in should_writeback(), because we still want to cache meta data I/Os even they are asynchronized.
Cc: stable@vger.kernel.org Signed-off-by: Coly Li colyli@suse.de Cc: Eric Wheeler bcache@linux.ewheeler.net
Acked-by: Eric Wheeler bcache@linux.ewheeler.net
drivers/md/bcache/request.c | 9 --------- 1 file changed, 9 deletions(-)
diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c index 73478a91a342..acc07c4f27ae 100644 --- a/drivers/md/bcache/request.c +++ b/drivers/md/bcache/request.c @@ -378,15 +378,6 @@ static bool check_should_bypass(struct cached_dev *dc, struct bio *bio) op_is_write(bio_op(bio)))) goto skip;
- /*
* Flag for bypass if the IO is for read-ahead or background,
* unless the read-ahead request is for metadata
* (eg, for gfs2 or xfs).
*/
- if (bio->bi_opf & (REQ_RAHEAD|REQ_BACKGROUND) &&
!(bio->bi_opf & (REQ_META|REQ_PRIO)))
goto skip;
- if (bio->bi_iter.bi_sector & (c->sb.block_size - 1) || bio_sectors(bio) & (c->sb.block_size - 1)) { pr_debug("skipping unaligned io");
-- 2.16.4
On 6 Jan 2020, Eric Wheeler spake thusly:
On Sat, 4 Jan 2020, Coly Li wrote:
In year 2007 high performance SSD was still expensive, in order to save more space for real workload or meta data, the readahead I/Os for non-meta data was bypassed and not cached on SSD.
It's also because readahead data is more likely to be useless.
In now days, SSD price drops a lot and people can find larger size SSD with more comfortable price. It is unncessary to bypass normal readahead I/Os to save SSD space for now.
Doesn't this reduce the utility of the cache by polluting it with unnecessary content? It seems to me that we need at least a *litle* evidence that this change is beneficial. (I mean, it might be beneficial if on average the data that was read ahead is actually used.)
What happens to the cache hit rates when this change has been running for a while?
On 2020/1/14 8:34 下午, Nix wrote:
On 6 Jan 2020, Eric Wheeler spake thusly:
On Sat, 4 Jan 2020, Coly Li wrote:
In year 2007 high performance SSD was still expensive, in order to save more space for real workload or meta data, the readahead I/Os for non-meta data was bypassed and not cached on SSD.
It's also because readahead data is more likely to be useless.
In now days, SSD price drops a lot and people can find larger size SSD with more comfortable price. It is unncessary to bypass normal readahead I/Os to save SSD space for now.
Hi Nix,
Doesn't this reduce the utility of the cache by polluting it with unnecessary content? It seems to me that we need at least a *litle* evidence that this change is beneficial. (I mean, it might be beneficial if on average the data that was read ahead is actually used.)
What happens to the cache hit rates when this change has been running for a while?
I have two reports offline and directly to me, one is from an email address of github and forwarded to me by Jens, one is from a China local storage startup.
The first report complains the desktop-pc benchmark is about 50% down and the root cause is located on commit b41c9b0 ("bcache: update bio->bi_opf bypass/writeback REQ_ flag hints").
The second report complains their small file workload (mixed read and write) has around 20%+ performance drop and the suspicious change is also focused on the readahead restriction.
The second reporter verifies this patch and confirms the performance issue has gone. I don't know who is the first report so no response so far.
I don't have exact hit rate number because the reporter does not provide (BTW, because the readahead request is bypassed, I feel the hit rate won't count on them indeed). But from the reports and one verification, IMHO this change makes sense.
Thanks.
On 15 Jan 2020, Coly Li stated:
I have two reports offline and directly to me, one is from an email address of github and forwarded to me by Jens, one is from a China local storage startup.
The first report complains the desktop-pc benchmark is about 50% down and the root cause is located on commit b41c9b0 ("bcache: update bio->bi_opf bypass/writeback REQ_ flag hints").
The second report complains their small file workload (mixed read and write) has around 20%+ performance drop and the suspicious change is also focused on the readahead restriction.
The second reporter verifies this patch and confirms the performance issue has gone. I don't know who is the first report so no response so far.
Hah! OK, looks like readahead is frequently-enough useful that caching it is better than not caching it :) I guess the problem is that if you don't cache it, it never gets cached at all even if it was useful, so the next time round you'll end up having to readahead it again :/
One wonders what effect this will have on a bcache-atop-RAID: will we end up caching whole stripes most of the time?
On 2020/1/15 8:39 下午, Nix wrote:
On 15 Jan 2020, Coly Li stated:
I have two reports offline and directly to me, one is from an email address of github and forwarded to me by Jens, one is from a China local storage startup.
The first report complains the desktop-pc benchmark is about 50% down and the root cause is located on commit b41c9b0 ("bcache: update bio->bi_opf bypass/writeback REQ_ flag hints").
The second report complains their small file workload (mixed read and write) has around 20%+ performance drop and the suspicious change is also focused on the readahead restriction.
The second reporter verifies this patch and confirms the performance issue has gone. I don't know who is the first report so no response so far.
Hah! OK, looks like readahead is frequently-enough useful that caching it is better than not caching it :) I guess the problem is that if you don't cache it, it never gets cached at all even if it was useful, so the next time round you'll end up having to readahead it again :/
Yes, this is the problem. The bypassed data won't have chance to go into cache always.
One wonders what effect this will have on a bcache-atop-RAID: will we end up caching whole stripes most of the time?
In my I/O pressure testing, I have a raid0 backing device assembled by 3 SSDs. From my observation, the whole stripe size won't be cached for small read/write requests. The stripe size alignment is handled in md raid layer, even md returns bio which stays in a stripe size memory chunk, bcache only takes bi_size part for its I/O.
Thanks.
linux-stable-mirror@lists.linaro.org