There is a bug in the aoe driver module where every forcible removal of an aoe device (eg. "rmmod aoe" with aoe devices available or "aoe-flush ex.x") leads to a page fault. The code in freedev() calls blk_mq_free_tag_set() before running blk_cleanup_queue() which leads to this issue (drivers/block/aoe/aoedev.c L281ff). This issue was fixed upstream in commit 6560ec9 (aoe: use blk_mq_alloc_disk and blk_cleanup_disk) with the introduction and use of the function blk_cleanup_disk().
This patch applies to kernels 5.4 and 5.10.
The function calls are reordered to match the behavior of blk_cleanup_disk() to mitigate this issue.
Fixes: 3582dd2 (aoe: convert aoeblk to blk-mq) Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=215647 Signed-off-by: Valentin Kleibel valentin@vrvis.at --- drivers/block/aoe/aoedev.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/block/aoe/aoedev.c b/drivers/block/aoe/aoedev.c index e2ea2356da06..08c98ea724ea 100644 --- a/drivers/block/aoe/aoedev.c +++ b/drivers/block/aoe/aoedev.c @@ -277,9 +277,9 @@ freedev(struct aoedev *d) if (d->gd) { aoedisk_rm_debugfs(d); del_gendisk(d->gd); + blk_cleanup_queue(d->blkq); put_disk(d->gd); blk_mq_free_tag_set(&d->tag_set); - blk_cleanup_queue(d->blkq); } t = d->targets; e = t + d->ntargets;
On Thu, Mar 10, 2022 at 12:53:01PM +0100, Valentin Kleibel wrote:
There is a bug in the aoe driver module where every forcible removal of an aoe device (eg. "rmmod aoe" with aoe devices available or "aoe-flush ex.x") leads to a page fault. The code in freedev() calls blk_mq_free_tag_set() before running blk_cleanup_queue() which leads to this issue (drivers/block/aoe/aoedev.c L281ff). This issue was fixed upstream in commit 6560ec9 (aoe: use blk_mq_alloc_disk and blk_cleanup_disk) with the introduction and use of the function blk_cleanup_disk().
This patch applies to kernels 5.4 and 5.10.
We need a fix for Linus's tree first before we can backport anything to older kernels. Does this also work there?
The function calls are reordered to match the behavior of blk_cleanup_disk() to mitigate this issue.
Fixes: 3582dd2 (aoe: convert aoeblk to blk-mq)
A few more digits in the sha1 here would be good, otherwise our tools will complain. It should look like: Fixes: 3582dd291788 ("aoe: convert aoeblk to blk-mq")
thanks,
greg k-h
On 10/03/2022 13:03, Greg Kroah-Hartman wrote:
This patch applies to kernels 5.4 and 5.10.
We need a fix for Linus's tree first before we can backport anything to older kernels. Does this also work there?
It is fixed in Linus' tree starting with 5.14. The patch reproduces the behavior of the current master introduced in commit 6560ec961a08 (aoe: use blk_mq_alloc_disk and blk_cleanup_disk).
Fixes: 3582dd2 (aoe: convert aoeblk to blk-mq)
A few more digits in the sha1 here would be good, otherwise our tools will complain. It should look like: Fixes: 3582dd291788 ("aoe: convert aoeblk to blk-mq")
thanks for the hint, i will do so in the future.
cheers, valentin
On Thu, Mar 10, 2022 at 01:24:38PM +0100, Valentin Kleibel wrote:
On 10/03/2022 13:03, Greg Kroah-Hartman wrote:
This patch applies to kernels 5.4 and 5.10.
We need a fix for Linus's tree first before we can backport anything to older kernels. Does this also work there?
It is fixed in Linus' tree starting with 5.14.
What commit fixes it there? Why not just backport that one only?
thanks,
greg k-h
On 10/03/2022 13:26, Greg Kroah-Hartman wrote:
On Thu, Mar 10, 2022 at 01:24:38PM +0100, Valentin Kleibel wrote:
On 10/03/2022 13:03, Greg Kroah-Hartman wrote:
This patch applies to kernels 5.4 and 5.10.
We need a fix for Linus's tree first before we can backport anything to older kernels. Does this also work there?
It is fixed in Linus' tree starting with 5.14.
What commit fixes it there? Why not just backport that one only?
commit 6560ec961a08 (aoe: use blk_mq_alloc_disk and blk_cleanup_disk) This commit uses the function blk_cleanup_disk() in freedev() in drivers/block/aoe/aoedev.c which fixes the issue. The function was introduced in f525464a8000 (block: add blk_alloc_disk and blk_cleanup_disk APIs): void blk_cleanup_disk(struct gendisk *disk) { blk_cleanup_queue(disk->queue); put_disk(disk); } EXPORT_SYMBOL(blk_cleanup_disk);
I tried to backport the fix to the lts kernels without introducing a new API by just adjusting the order of the two function calls. Is it preferable to introduce and use the function blk_cleanup_disk()?
cheers, valentin
On Thu, Mar 10, 2022 at 01:55:25PM +0100, Valentin Kleibel wrote:
On 10/03/2022 13:26, Greg Kroah-Hartman wrote:
On Thu, Mar 10, 2022 at 01:24:38PM +0100, Valentin Kleibel wrote:
On 10/03/2022 13:03, Greg Kroah-Hartman wrote:
This patch applies to kernels 5.4 and 5.10.
We need a fix for Linus's tree first before we can backport anything to older kernels. Does this also work there?
It is fixed in Linus' tree starting with 5.14.
What commit fixes it there? Why not just backport that one only?
commit 6560ec961a08 (aoe: use blk_mq_alloc_disk and blk_cleanup_disk) This commit uses the function blk_cleanup_disk() in freedev() in drivers/block/aoe/aoedev.c which fixes the issue. The function was introduced in f525464a8000 (block: add blk_alloc_disk and blk_cleanup_disk APIs): void blk_cleanup_disk(struct gendisk *disk) { blk_cleanup_queue(disk->queue); put_disk(disk); } EXPORT_SYMBOL(blk_cleanup_disk);
I tried to backport the fix to the lts kernels without introducing a new API by just adjusting the order of the two function calls. Is it preferable to introduce and use the function blk_cleanup_disk()?
I do not know, sorry. But we prefer the upstream commits as much as possible as one-off changes like this are almost always buggy and wrong in the end.
Try doing the backports and see what that looks like please.
thanks,
greg k-h
I do not know, sorry. But we prefer the upstream commits as much as possible as one-off changes like this are almost always buggy and wrong in the end.
Try doing the backports and see what that looks like please.
I did the backports now but Christoph Hellwig already pointed out:
No idea what is hidden in bugzilla, but the blk_mq_alloc_disk changes aren't easily backportable.
[https://lore.kernel.org/all/20220308060916.GB23629@lst.de/]
Therefore I cherry-picked only the changes for blk_cleanup_disk as they are much simpler than the changes to blk_mq_alloc_disk.
I kept the original commit messages, please tell me if you feel they should be changed or do so yourself.
Please apply to 5.4 and 5.10.
Cheers, Valentin
changelog: v2: * cherry pick from upstream commits instead of creating a new patch
Add two new APIs to allocate and free a gendisk including the request_queue for use with BIO based drivers. This is to avoid boilerplate code in drivers.
Signed-off-by: Christoph Hellwig hch@lst.de Reviewed-by: Hannes Reinecke hare@suse.de Reviewed-by: Ulf Hansson ulf.hansson@linaro.org Link: https://lore.kernel.org/r/20210521055116.1053587-6-hch@lst.de Signed-off-by: Jens Axboe axboe@kernel.dk (cherry picked from commit f525464a8000f092c20b00eead3eaa9d849c599e) Fixes: 3582dd291788 (aoe: convert aoeblk to blk-mq) Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=215647 Signed-off-by: Valentin Kleibel valentin@vrvis.at --- block/genhd.c | 15 +++++++++++++++ include/linux/genhd.h | 1 + 2 files changed, 16 insertions(+)
diff --git a/block/genhd.c b/block/genhd.c index 796baf761202..421cad085502 100644 --- a/block/genhd.c +++ b/block/genhd.c @@ -1836,6 +1836,21 @@ void put_disk_and_module(struct gendisk *disk) } } EXPORT_SYMBOL(put_disk_and_module); +/** + * blk_cleanup_disk - shutdown a gendisk allocated by blk_alloc_disk + * @disk: gendisk to shutdown + * + * Mark the queue hanging off @disk DYING, drain all pending requests, then mark + * the queue DEAD, destroy and put it and the gendisk structure. + * + * Context: can sleep + */ +void blk_cleanup_disk(struct gendisk *disk) +{ + blk_cleanup_queue(disk->queue); + put_disk(disk); +} +EXPORT_SYMBOL(blk_cleanup_disk);
static void set_disk_ro_uevent(struct gendisk *gd, int ro) { diff --git a/include/linux/genhd.h b/include/linux/genhd.h index 03da3f603d30..b7b180d3734a 100644 --- a/include/linux/genhd.h +++ b/include/linux/genhd.h @@ -369,6 +369,7 @@ extern void blk_unregister_region(dev_t devt, unsigned long range); #define alloc_disk(minors) alloc_disk_node(minors, NUMA_NO_NODE)
int register_blkdev(unsigned int major, const char *name); +void blk_cleanup_disk(struct gendisk *disk); void unregister_blkdev(unsigned int major, const char *name);
void revalidate_disk_size(struct gendisk *disk, bool verbose);
On Thu, Mar 31, 2022 at 12:00:08PM +0200, Valentin Kleibel wrote:
Add two new APIs to allocate and free a gendisk including the request_queue for use with BIO based drivers. This is to avoid boilerplate code in drivers.
Signed-off-by: Christoph Hellwig hch@lst.de Reviewed-by: Hannes Reinecke hare@suse.de Reviewed-by: Ulf Hansson ulf.hansson@linaro.org Link: https://lore.kernel.org/r/20210521055116.1053587-6-hch@lst.de Signed-off-by: Jens Axboe axboe@kernel.dk (cherry picked from commit f525464a8000f092c20b00eead3eaa9d849c599e) Fixes: 3582dd291788 (aoe: convert aoeblk to blk-mq) Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=215647 Signed-off-by: Valentin Kleibel valentin@vrvis.at
block/genhd.c | 15 +++++++++++++++ include/linux/genhd.h | 1 + 2 files changed, 16 insertions(+)
diff --git a/block/genhd.c b/block/genhd.c index 796baf761202..421cad085502 100644 --- a/block/genhd.c +++ b/block/genhd.c @@ -1836,6 +1836,21 @@ void put_disk_and_module(struct gendisk *disk) } } EXPORT_SYMBOL(put_disk_and_module); +/**
- blk_cleanup_disk - shutdown a gendisk allocated by blk_alloc_disk
- @disk: gendisk to shutdown
- Mark the queue hanging off @disk DYING, drain all pending requests, then
mark
- the queue DEAD, destroy and put it and the gendisk structure.
- Context: can sleep
- */
+void blk_cleanup_disk(struct gendisk *disk) +{
blk_cleanup_queue(disk->queue);
put_disk(disk);
+} +EXPORT_SYMBOL(blk_cleanup_disk);
static void set_disk_ro_uevent(struct gendisk *gd, int ro) { diff --git a/include/linux/genhd.h b/include/linux/genhd.h index 03da3f603d30..b7b180d3734a 100644 --- a/include/linux/genhd.h +++ b/include/linux/genhd.h @@ -369,6 +369,7 @@ extern void blk_unregister_region(dev_t devt, unsigned long range); #define alloc_disk(minors) alloc_disk_node(minors, NUMA_NO_NODE)
int register_blkdev(unsigned int major, const char *name); +void blk_cleanup_disk(struct gendisk *disk); void unregister_blkdev(unsigned int major, const char *name);
void revalidate_disk_size(struct gendisk *disk, bool verbose);
This backport looks to be incomplete, and is also totally whitespace damaged and can not be applied at all :(
Please fix both up and resend.
thanks,
greg k-h
Use blk_mq_alloc_disk and blk_cleanup_disk to simplify the gendisk and request_queue allocation.
Signed-off-by: Christoph Hellwig hch@lst.de Reviewed-by: Chaitanya Kulkarni chaitanya.kulkarni@wdc.com Link: https://lore.kernel.org/r/20210602065345.355274-17-hch@lst.de Signed-off-by: Jens Axboe axboe@kernel.dk (cherry picked from commit 6560ec961a080944f8d5e1fef17b771bfaf189cb) Fixes: 3582dd291788 (aoe: convert aoeblk to blk-mq) Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=215647 Signed-off-by: Valentin Kleibel valentin@vrvis.at --- drivers/block/aoe/aoedev.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/block/aoe/aoedev.c b/drivers/block/aoe/aoedev.c index e2ea2356da06..c5753c6bfe80 100644 --- a/drivers/block/aoe/aoedev.c +++ b/drivers/block/aoe/aoedev.c @@ -277,9 +277,8 @@ freedev(struct aoedev *d) if (d->gd) { aoedisk_rm_debugfs(d); del_gendisk(d->gd); - put_disk(d->gd); + blk_cleanup_disk(d->gd); blk_mq_free_tag_set(&d->tag_set); - blk_cleanup_queue(d->blkq); } t = d->targets; e = t + d->ntargets;
linux-stable-mirror@lists.linaro.org