A new field was introduced in 74d46992e0d9dee7f1f376de0d56d31614c8a17a, bi_partno, instead of using bdev->bd_contains and encoding the partition information in the bi_bdev field. __bio_clone_fast was changed to copy the disk information, but not the partition information. At minimum, this regressed bcache and caused data corruption.
Signed-off-by: Michael Lyle mlyle@lyle.org Fixes: 74d46992e0d9dee7f1f376de0d56d31614c8a17a Reported-by: Pavel Goran via-bcache@pvgoran.name Reported-by: Campbell Steven casteven@gmail.com Cc: Christoph Hellwig hch@lst.de Cc: Jens Axboe axboe@kernel.dk Cc: stable@vger.kernel.org --- block/bio.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/block/bio.c b/block/bio.c index 101c2a9b5481..33fa6b4af312 100644 --- a/block/bio.c +++ b/block/bio.c @@ -597,6 +597,7 @@ void __bio_clone_fast(struct bio *bio, struct bio *bio_src) * so we don't set nor calculate new physical/hw segment counts here */ bio->bi_disk = bio_src->bi_disk; + bio->bi_partno = bio_src->bi_partno; bio_set_flag(bio, BIO_CLONED); bio->bi_opf = bio_src->bi_opf; bio->bi_write_hint = bio_src->bi_write_hint;
On 17/11/2017 3:47 PM, Michael Lyle wrote:
A new field was introduced in 74d46992e0d9dee7f1f376de0d56d31614c8a17a, bi_partno, instead of using bdev->bd_contains and encoding the partition information in the bi_bdev field. __bio_clone_fast was changed to copy the disk information, but not the partition information. At minimum, this regressed bcache and caused data corruption.
Hi Michael,
Thanks for the fix, it looks good to me.
Signed-off-by: Michael Lyle mlyle@lyle.org Fixes: 74d46992e0d9dee7f1f376de0d56d31614c8a17a Reported-by: Pavel Goran via-bcache@pvgoran.name Reported-by: Campbell Steven casteven@gmail.com
Reviewed-by: Coly Li colyli@suse.de
Coly Li
Cc: Christoph Hellwig hch@lst.de Cc: Jens Axboe axboe@kernel.dk Cc: stable@vger.kernel.org
block/bio.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/block/bio.c b/block/bio.c index 101c2a9b5481..33fa6b4af312 100644 --- a/block/bio.c +++ b/block/bio.c @@ -597,6 +597,7 @@ void __bio_clone_fast(struct bio *bio, struct bio *bio_src) * so we don't set nor calculate new physical/hw segment counts here */ bio->bi_disk = bio_src->bi_disk;
- bio->bi_partno = bio_src->bi_partno; bio_set_flag(bio, BIO_CLONED); bio->bi_opf = bio_src->bi_opf; bio->bi_write_hint = bio_src->bi_write_hint;
On Thu, Nov 16, 2017 at 11:47:25PM -0800, Michael Lyle wrote:
A new field was introduced in 74d46992e0d9dee7f1f376de0d56d31614c8a17a, bi_partno, instead of using bdev->bd_contains and encoding the partition information in the bi_bdev field. __bio_clone_fast was changed to copy the disk information, but not the partition information. At minimum, this regressed bcache and caused data corruption.
Signed-off-by: Michael Lyle mlyle@lyle.org Fixes: 74d46992e0d9dee7f1f376de0d56d31614c8a17a Reported-by: Pavel Goran via-bcache@pvgoran.name Reported-by: Campbell Steven casteven@gmail.com Cc: Christoph Hellwig hch@lst.de Cc: Jens Axboe axboe@kernel.dk Cc: stable@vger.kernel.org
block/bio.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/block/bio.c b/block/bio.c index 101c2a9b5481..33fa6b4af312 100644 --- a/block/bio.c +++ b/block/bio.c @@ -597,6 +597,7 @@ void __bio_clone_fast(struct bio *bio, struct bio *bio_src) * so we don't set nor calculate new physical/hw segment counts here */ bio->bi_disk = bio_src->bi_disk;
- bio->bi_partno = bio_src->bi_partno; bio_set_flag(bio, BIO_CLONED); bio->bi_opf = bio_src->bi_opf; bio->bi_write_hint = bio_src->bi_write_hint;
Reviewed-by: Ming Lei ming.lei@redhat.com
Looks good,
Reviewed-by: Christoph Hellwig hch@lst.de
On 11/17/2017 12:47 AM, Michael Lyle wrote:
A new field was introduced in 74d46992e0d9dee7f1f376de0d56d31614c8a17a, bi_partno, instead of using bdev->bd_contains and encoding the partition information in the bi_bdev field. __bio_clone_fast was changed to copy the disk information, but not the partition information. At minimum, this regressed bcache and caused data corruption.
That's not good... Fix looks good to me, I'll queue this up for a pull today. Thanks for bisecting this, Michael.
On Thu, 2017-11-16 at 23:47 -0800, Michael Lyle wrote:
diff --git a/block/bio.c b/block/bio.c index 101c2a9b5481..33fa6b4af312 100644 --- a/block/bio.c +++ b/block/bio.c @@ -597,6 +597,7 @@ void __bio_clone_fast(struct bio *bio, struct bio *bio_src) * so we don't set nor calculate new physical/hw segment counts here */ bio->bi_disk = bio_src->bi_disk;
- bio->bi_partno = bio_src->bi_partno; bio_set_flag(bio, BIO_CLONED); bio->bi_opf = bio_src->bi_opf; bio->bi_write_hint = bio_src->bi_write_hint;
Have you considered to use bio_copy_dev() instead of open-coding it?
Additionally, there is more code that copies these fields, e.g. the code in bio_clone_bioset(). Shouldn't that code be modified such that it also copies bi_partno?
How about the following class of assignments in drivers/md/raid1.c:
mbio->bi_disk = (void *)conf->mirrors[i].rdev;
Should these assignments perhaps be followed by a mbio->bi_partno assignment?
How about the following class of assignments in the NVMe code:
bio->bi_disk = disk;
Should these assignments perhaps be followed by a bio->bi_partno assignment?
Thanks,
Bart.
Jens & everyone-- thanks for the speedy review and handling. I've updated my test cases to ensure that volumes from old releases work, even when I "don't think" there's been a disk format change.
Bart--
On 11/17/2017 08:50 AM, Bart Van Assche wrote:
Have you considered to use bio_copy_dev() instead of open-coding it?
One could... Right now almost all the uses of bio_copy_dev are in bcache and they need to change for other reasons. (e.g. macro uses parameter more than once, function is passed in as parameter). There's a whole lot of places to change if it's desired to make bio_copy_dev universally used to copy device information.
Additionally, there is more code that copies these fields, e.g. the code in bio_clone_bioset(). Shouldn't that code be modified such that it also copies bi_partno?
Yes, when I was grepping around there were other things that looked possibly unsafe. I don't have test environments for all of these other subsystems.
I wanted to get the minimal fix for this in, though, because people are actively losing data to the problem it triggers with bcache.
Mike
On Fri, Nov 17, 2017 at 04:50:39PM +0000, Bart Van Assche wrote:
How about the following class of assignments in drivers/md/raid1.c:
mbio->bi_disk = (void *)conf->mirrors[i].rdev;
Should these assignments perhaps be followed by a mbio->bi_partno assignment?
No. They assign a struct md_rdev to the bi_disk pointer, abusing it for internal storage. They should not assign mbio->bi_partno. Instead we should figure out a way to get rid of this.
How about the following class of assignments in the NVMe code:
bio->bi_disk = disk;
Should these assignments perhaps be followed by a bio->bi_partno assignment?
No, the multipath code always works on whole namespaces, not partitions.
On 17 November 2017 at 20:47, Michael Lyle mlyle@lyle.org wrote:
A new field was introduced in 74d46992e0d9dee7f1f376de0d56d31614c8a17a, bi_partno, instead of using bdev->bd_contains and encoding the partition information in the bi_bdev field. __bio_clone_fast was changed to copy the disk information, but not the partition information. At minimum, this regressed bcache and caused data corruption.
Thanks alot for this Michael, I will run it up on our test servers today.
Campbell
linux-stable-mirror@lists.linaro.org