Turns out we are sending a lot more hotplug events then we need, and
this is causing some pretty serious issues. Currently, we call
intel_dp_mst_resume() in i915_drm_resume() well before we have any sort
of hotplugging setup. This is a pretty big problem, because in practice
it will generally result in throwing the power domain refcounts out of
wack.
For instance: On my T480s, removing a previously connected topology
before the system finishes resuming causes
drm_kms_helper_hotplug_event() to be called before HPD is setup again,
which causes us to do a connector reprobe, which then causes
intel_dp_detect() to be called on all DP devices -including- the eDP
display. From there, intel_dp_detect() is run on the eDP display which
triggers DPCD transactions. Those DPCD transactions then cause us to
call edp_panel_vdd_on(), which then causes us to grab an additional
wakeref to the relevant power wells (PORT_DDI_A_IO on this machine).
>From there, this wakeref is never released which then causes the next
suspend/resume cycle to entirely fail due to the hardware not being
powered off correctly.
This sucks really badly, and I don't see any decent way to actually fix
this in intel_dp_detect() easily. Additionally, I don't even think it'd
be worth the time now since we're not expecting to handle any kind of
connector reprobing at the point in which we call intel_dp_mst_resume(),
but we also can't move intel_dp_mst_resume() any higher in the resume
process since MST topologies need to be resumed before
intel_display_resume() is called.
However, there's a light at the end of the tunnel! After reading through
a lot of code dozens of times, it occurred to me that we -never-
actually need to send hotplug events when calling
drm_dp_mst_topology_mgr_set_mst() since we send hotplug events in
drm_dp_destroy_connector_work(). Imagine that!
So, since we only seem to call intel_dp_mst_check_status() to disable
MST on the encoder in question and then send a hotplug, get rid of this
and instead just disable MST mode when a hub fails in
intel_dp_mst_resume(). From there, drm_dp_destroy_connector_work() will
eventually send the hotplug event.
Signed-off-by: Lyude Paul <lyude(a)redhat.com>
Fixes: 0e32b39ceed6 ("drm/i915: add DP 1.2 MST support (v0.7)")
Cc: Todd Previte <tprevite(a)gmail.com>
Cc: Dave Airlie <airlied(a)redhat.com>
Cc: Jani Nikula <jani.nikula(a)linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen(a)linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi(a)intel.com>
Cc: intel-gfx(a)lists.freedesktop.org
Cc: <stable(a)vger.kernel.org> # v3.17+
---
drivers/gpu/drm/i915/intel_dp.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/intel_dp.c b/drivers/gpu/drm/i915/intel_dp.c
index 681e88405ada..c2399acf177b 100644
--- a/drivers/gpu/drm/i915/intel_dp.c
+++ b/drivers/gpu/drm/i915/intel_dp.c
@@ -7096,7 +7096,10 @@ void intel_dp_mst_resume(struct drm_i915_private *dev_priv)
continue;
ret = drm_dp_mst_topology_mgr_resume(&intel_dp->mst_mgr);
- if (ret)
- intel_dp_check_mst_status(intel_dp);
+ if (ret) {
+ intel_dp->is_mst = false;
+ drm_dp_mst_topology_mgr_set_mst(&intel_dp->mst_mgr,
+ false);
+ }
}
}
--
2.20.1
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From a1e1cb72d96491277ede8d257ce6b48a381dd336 Mon Sep 17 00:00:00 2001
From: Mike Snitzer <snitzer(a)redhat.com>
Date: Thu, 17 Jan 2019 10:48:01 -0500
Subject: [PATCH] dm: fix redundant IO accounting for bios that need splitting
The risk of redundant IO accounting was not taken into consideration
when commit 18a25da84354 ("dm: ensure bio submission follows a
depth-first tree walk") introduced IO splitting in terms of recursion
via generic_make_request().
Fix this by subtracting the split bio's payload from the IO stats that
were already accounted for by start_io_acct() upon dm_make_request()
entry. This repeat oscillation of the IO accounting, up then down,
isn't ideal but refactoring DM core's IO splitting to pre-split bios
_before_ they are accounted turned out to be an excessive amount of
change that will need a full development cycle to refine and verify.
Before this fix:
/dev/mapper/stripe_dev is a 4-way stripe using a 32k chunksize, so
bios are split on 32k boundaries.
# fio --name=16M --filename=/dev/mapper/stripe_dev --rw=write --bs=64k --size=16M \
--iodepth=1 --ioengine=libaio --direct=1 --refill_buffers
with debugging added:
[103898.310264] device-mapper: core: start_io_acct: dm-2 WRITE bio->bi_iter.bi_sector=0 len=128
[103898.318704] device-mapper: core: __split_and_process_bio: recursing for following split bio:
[103898.329136] device-mapper: core: start_io_acct: dm-2 WRITE bio->bi_iter.bi_sector=64 len=64
...
16M written yet 136M (278528 * 512b) accounted:
# cat /sys/block/dm-2/stat | awk '{ print $7 }'
278528
After this fix:
16M written and 16M (32768 * 512b) accounted:
# cat /sys/block/dm-2/stat | awk '{ print $7 }'
32768
Fixes: 18a25da84354 ("dm: ensure bio submission follows a depth-first tree walk")
Cc: stable(a)vger.kernel.org # 4.16+
Reported-by: Bryan Gurney <bgurney(a)redhat.com>
Reviewed-by: Ming Lei <ming.lei(a)redhat.com>
Signed-off-by: Mike Snitzer <snitzer(a)redhat.com>
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index fcb97b0a5743..fbadda68e23b 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1584,6 +1584,9 @@ static void init_clone_info(struct clone_info *ci, struct mapped_device *md,
ci->sector = bio->bi_iter.bi_sector;
}
+#define __dm_part_stat_sub(part, field, subnd) \
+ (part_stat_get(part, field) -= (subnd))
+
/*
* Entry point to split a bio into clones and submit them to the targets.
*/
@@ -1638,6 +1641,19 @@ static blk_qc_t __split_and_process_bio(struct mapped_device *md,
struct bio *b = bio_split(bio, bio_sectors(bio) - ci.sector_count,
GFP_NOIO, &md->queue->bio_split);
ci.io->orig_bio = b;
+
+ /*
+ * Adjust IO stats for each split, otherwise upon queue
+ * reentry there will be redundant IO accounting.
+ * NOTE: this is a stop-gap fix, a proper fix involves
+ * significant refactoring of DM core's bio splitting
+ * (by eliminating DM's splitting and just using bio_split)
+ */
+ part_stat_lock();
+ __dm_part_stat_sub(&dm_disk(md)->part0,
+ sectors[op_stat_group(bio_op(bio))], ci.sector_count);
+ part_stat_unlock();
+
bio_chain(b, bio);
ret = generic_make_request(bio);
break;
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From a1e1cb72d96491277ede8d257ce6b48a381dd336 Mon Sep 17 00:00:00 2001
From: Mike Snitzer <snitzer(a)redhat.com>
Date: Thu, 17 Jan 2019 10:48:01 -0500
Subject: [PATCH] dm: fix redundant IO accounting for bios that need splitting
The risk of redundant IO accounting was not taken into consideration
when commit 18a25da84354 ("dm: ensure bio submission follows a
depth-first tree walk") introduced IO splitting in terms of recursion
via generic_make_request().
Fix this by subtracting the split bio's payload from the IO stats that
were already accounted for by start_io_acct() upon dm_make_request()
entry. This repeat oscillation of the IO accounting, up then down,
isn't ideal but refactoring DM core's IO splitting to pre-split bios
_before_ they are accounted turned out to be an excessive amount of
change that will need a full development cycle to refine and verify.
Before this fix:
/dev/mapper/stripe_dev is a 4-way stripe using a 32k chunksize, so
bios are split on 32k boundaries.
# fio --name=16M --filename=/dev/mapper/stripe_dev --rw=write --bs=64k --size=16M \
--iodepth=1 --ioengine=libaio --direct=1 --refill_buffers
with debugging added:
[103898.310264] device-mapper: core: start_io_acct: dm-2 WRITE bio->bi_iter.bi_sector=0 len=128
[103898.318704] device-mapper: core: __split_and_process_bio: recursing for following split bio:
[103898.329136] device-mapper: core: start_io_acct: dm-2 WRITE bio->bi_iter.bi_sector=64 len=64
...
16M written yet 136M (278528 * 512b) accounted:
# cat /sys/block/dm-2/stat | awk '{ print $7 }'
278528
After this fix:
16M written and 16M (32768 * 512b) accounted:
# cat /sys/block/dm-2/stat | awk '{ print $7 }'
32768
Fixes: 18a25da84354 ("dm: ensure bio submission follows a depth-first tree walk")
Cc: stable(a)vger.kernel.org # 4.16+
Reported-by: Bryan Gurney <bgurney(a)redhat.com>
Reviewed-by: Ming Lei <ming.lei(a)redhat.com>
Signed-off-by: Mike Snitzer <snitzer(a)redhat.com>
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index fcb97b0a5743..fbadda68e23b 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1584,6 +1584,9 @@ static void init_clone_info(struct clone_info *ci, struct mapped_device *md,
ci->sector = bio->bi_iter.bi_sector;
}
+#define __dm_part_stat_sub(part, field, subnd) \
+ (part_stat_get(part, field) -= (subnd))
+
/*
* Entry point to split a bio into clones and submit them to the targets.
*/
@@ -1638,6 +1641,19 @@ static blk_qc_t __split_and_process_bio(struct mapped_device *md,
struct bio *b = bio_split(bio, bio_sectors(bio) - ci.sector_count,
GFP_NOIO, &md->queue->bio_split);
ci.io->orig_bio = b;
+
+ /*
+ * Adjust IO stats for each split, otherwise upon queue
+ * reentry there will be redundant IO accounting.
+ * NOTE: this is a stop-gap fix, a proper fix involves
+ * significant refactoring of DM core's bio splitting
+ * (by eliminating DM's splitting and just using bio_split)
+ */
+ part_stat_lock();
+ __dm_part_stat_sub(&dm_disk(md)->part0,
+ sectors[op_stat_group(bio_op(bio))], ci.sector_count);
+ part_stat_unlock();
+
bio_chain(b, bio);
ret = generic_make_request(bio);
break;
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From a1e1cb72d96491277ede8d257ce6b48a381dd336 Mon Sep 17 00:00:00 2001
From: Mike Snitzer <snitzer(a)redhat.com>
Date: Thu, 17 Jan 2019 10:48:01 -0500
Subject: [PATCH] dm: fix redundant IO accounting for bios that need splitting
The risk of redundant IO accounting was not taken into consideration
when commit 18a25da84354 ("dm: ensure bio submission follows a
depth-first tree walk") introduced IO splitting in terms of recursion
via generic_make_request().
Fix this by subtracting the split bio's payload from the IO stats that
were already accounted for by start_io_acct() upon dm_make_request()
entry. This repeat oscillation of the IO accounting, up then down,
isn't ideal but refactoring DM core's IO splitting to pre-split bios
_before_ they are accounted turned out to be an excessive amount of
change that will need a full development cycle to refine and verify.
Before this fix:
/dev/mapper/stripe_dev is a 4-way stripe using a 32k chunksize, so
bios are split on 32k boundaries.
# fio --name=16M --filename=/dev/mapper/stripe_dev --rw=write --bs=64k --size=16M \
--iodepth=1 --ioengine=libaio --direct=1 --refill_buffers
with debugging added:
[103898.310264] device-mapper: core: start_io_acct: dm-2 WRITE bio->bi_iter.bi_sector=0 len=128
[103898.318704] device-mapper: core: __split_and_process_bio: recursing for following split bio:
[103898.329136] device-mapper: core: start_io_acct: dm-2 WRITE bio->bi_iter.bi_sector=64 len=64
...
16M written yet 136M (278528 * 512b) accounted:
# cat /sys/block/dm-2/stat | awk '{ print $7 }'
278528
After this fix:
16M written and 16M (32768 * 512b) accounted:
# cat /sys/block/dm-2/stat | awk '{ print $7 }'
32768
Fixes: 18a25da84354 ("dm: ensure bio submission follows a depth-first tree walk")
Cc: stable(a)vger.kernel.org # 4.16+
Reported-by: Bryan Gurney <bgurney(a)redhat.com>
Reviewed-by: Ming Lei <ming.lei(a)redhat.com>
Signed-off-by: Mike Snitzer <snitzer(a)redhat.com>
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index fcb97b0a5743..fbadda68e23b 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1584,6 +1584,9 @@ static void init_clone_info(struct clone_info *ci, struct mapped_device *md,
ci->sector = bio->bi_iter.bi_sector;
}
+#define __dm_part_stat_sub(part, field, subnd) \
+ (part_stat_get(part, field) -= (subnd))
+
/*
* Entry point to split a bio into clones and submit them to the targets.
*/
@@ -1638,6 +1641,19 @@ static blk_qc_t __split_and_process_bio(struct mapped_device *md,
struct bio *b = bio_split(bio, bio_sectors(bio) - ci.sector_count,
GFP_NOIO, &md->queue->bio_split);
ci.io->orig_bio = b;
+
+ /*
+ * Adjust IO stats for each split, otherwise upon queue
+ * reentry there will be redundant IO accounting.
+ * NOTE: this is a stop-gap fix, a proper fix involves
+ * significant refactoring of DM core's bio splitting
+ * (by eliminating DM's splitting and just using bio_split)
+ */
+ part_stat_lock();
+ __dm_part_stat_sub(&dm_disk(md)->part0,
+ sectors[op_stat_group(bio_op(bio))], ci.sector_count);
+ part_stat_unlock();
+
bio_chain(b, bio);
ret = generic_make_request(bio);
break;
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From a1e1cb72d96491277ede8d257ce6b48a381dd336 Mon Sep 17 00:00:00 2001
From: Mike Snitzer <snitzer(a)redhat.com>
Date: Thu, 17 Jan 2019 10:48:01 -0500
Subject: [PATCH] dm: fix redundant IO accounting for bios that need splitting
The risk of redundant IO accounting was not taken into consideration
when commit 18a25da84354 ("dm: ensure bio submission follows a
depth-first tree walk") introduced IO splitting in terms of recursion
via generic_make_request().
Fix this by subtracting the split bio's payload from the IO stats that
were already accounted for by start_io_acct() upon dm_make_request()
entry. This repeat oscillation of the IO accounting, up then down,
isn't ideal but refactoring DM core's IO splitting to pre-split bios
_before_ they are accounted turned out to be an excessive amount of
change that will need a full development cycle to refine and verify.
Before this fix:
/dev/mapper/stripe_dev is a 4-way stripe using a 32k chunksize, so
bios are split on 32k boundaries.
# fio --name=16M --filename=/dev/mapper/stripe_dev --rw=write --bs=64k --size=16M \
--iodepth=1 --ioengine=libaio --direct=1 --refill_buffers
with debugging added:
[103898.310264] device-mapper: core: start_io_acct: dm-2 WRITE bio->bi_iter.bi_sector=0 len=128
[103898.318704] device-mapper: core: __split_and_process_bio: recursing for following split bio:
[103898.329136] device-mapper: core: start_io_acct: dm-2 WRITE bio->bi_iter.bi_sector=64 len=64
...
16M written yet 136M (278528 * 512b) accounted:
# cat /sys/block/dm-2/stat | awk '{ print $7 }'
278528
After this fix:
16M written and 16M (32768 * 512b) accounted:
# cat /sys/block/dm-2/stat | awk '{ print $7 }'
32768
Fixes: 18a25da84354 ("dm: ensure bio submission follows a depth-first tree walk")
Cc: stable(a)vger.kernel.org # 4.16+
Reported-by: Bryan Gurney <bgurney(a)redhat.com>
Reviewed-by: Ming Lei <ming.lei(a)redhat.com>
Signed-off-by: Mike Snitzer <snitzer(a)redhat.com>
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index fcb97b0a5743..fbadda68e23b 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1584,6 +1584,9 @@ static void init_clone_info(struct clone_info *ci, struct mapped_device *md,
ci->sector = bio->bi_iter.bi_sector;
}
+#define __dm_part_stat_sub(part, field, subnd) \
+ (part_stat_get(part, field) -= (subnd))
+
/*
* Entry point to split a bio into clones and submit them to the targets.
*/
@@ -1638,6 +1641,19 @@ static blk_qc_t __split_and_process_bio(struct mapped_device *md,
struct bio *b = bio_split(bio, bio_sectors(bio) - ci.sector_count,
GFP_NOIO, &md->queue->bio_split);
ci.io->orig_bio = b;
+
+ /*
+ * Adjust IO stats for each split, otherwise upon queue
+ * reentry there will be redundant IO accounting.
+ * NOTE: this is a stop-gap fix, a proper fix involves
+ * significant refactoring of DM core's bio splitting
+ * (by eliminating DM's splitting and just using bio_split)
+ */
+ part_stat_lock();
+ __dm_part_stat_sub(&dm_disk(md)->part0,
+ sectors[op_stat_group(bio_op(bio))], ci.sector_count);
+ part_stat_unlock();
+
bio_chain(b, bio);
ret = generic_make_request(bio);
break;
Realtek bluetooth may not work after reboot:
[ 12.446130] Bluetooth: hci0: RTL: rtl: unknown IC info, lmp subver a99e, hci rev 826c, hci ver 0008
This is a regression introduced by commit 26503ad25de8 ("Bluetooth:
btrtl: split the device initialization into smaller parts"). The new
logic errors out early when no matching IC info can be found, in this
case it means the firmware is already loaded.
So let's assume the firmware is already loaded when we can't find
matching IC info, like the old logic did.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=201921
Fixes: 26503ad25de8 ("Bluetooth: btrtl: split the device initialization into smaller parts")
Cc: stable(a)vger.kernel.org # 4.19+
Signed-off-by: Kai-Heng Feng <kai.heng.feng(a)canonical.com>
---
drivers/bluetooth/btrtl.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/drivers/bluetooth/btrtl.c b/drivers/bluetooth/btrtl.c
index 41405de27d66..c91bba00df4e 100644
--- a/drivers/bluetooth/btrtl.c
+++ b/drivers/bluetooth/btrtl.c
@@ -552,10 +552,9 @@ struct btrtl_device_info *btrtl_initialize(struct hci_dev *hdev,
hdev->bus);
if (!btrtl_dev->ic_info) {
- rtl_dev_err(hdev, "rtl: unknown IC info, lmp subver %04x, hci rev %04x, hci ver %04x",
+ rtl_dev_info(hdev, "rtl: unknown IC info, lmp subver %04x, hci rev %04x, hci ver %04x",
lmp_subver, hci_rev, hci_ver);
- ret = -EINVAL;
- goto err_free;
+ return btrtl_dev;
}
if (btrtl_dev->ic_info->has_rom_version) {
@@ -610,6 +609,11 @@ int btrtl_download_firmware(struct hci_dev *hdev,
* standard btusb. Once that firmware is uploaded, the subver changes
* to a different value.
*/
+ if (!btrtl_dev->ic_info) {
+ rtl_dev_info(hdev, "rtl: assuming no firmware upload needed\n");
+ return 0;
+ }
+
switch (btrtl_dev->ic_info->lmp_subver) {
case RTL_ROM_LMP_8723A:
case RTL_ROM_LMP_3499:
--
2.17.1
For OUT endpoints, zero-length transfers require MaxPacketSize buffer as
per the DWC_usb3 programming guide 3.30a section 4.2.3.3.
This patch fixes this by explicitly checking zero length
transfer to correctly pad up to MaxPacketSize.
Fixes: c6267a51639b ("usb: dwc3: gadget: align transfers to wMaxPacketSize")
Cc: stable(a)vger.kernel.org
Signed-off-by: Tejas Joglekar <joglekar(a)synopsys.com>
---
Changes in v3:
- Enclose summary for fixes with " " and no wrap
Changes in v2:
- Remove sg patch hunk
- Added fixes and stable tag
drivers/usb/dwc3/gadget.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index bed2ff4..6c9b76b 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -1119,7 +1119,7 @@ static void dwc3_prepare_one_trb_linear(struct dwc3_ep *dep,
unsigned int maxp = usb_endpoint_maxp(dep->endpoint.desc);
unsigned int rem = length % maxp;
- if (rem && usb_endpoint_dir_out(dep->endpoint.desc)) {
+ if ((!length || rem) && usb_endpoint_dir_out(dep->endpoint.desc)) {
struct dwc3 *dwc = dep->dwc;
struct dwc3_trb *trb;
--
2.7.4
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 5cbab6303b4791a3e6713dfe2c5fda6a867f9adc Mon Sep 17 00:00:00 2001
From: Raju Rangoju <rajur(a)chelsio.com>
Date: Thu, 3 Jan 2019 23:05:31 +0530
Subject: [PATCH] nvmet-rdma: fix null dereference under heavy load
Under heavy load if we don't have any pre-allocated rsps left, we
dynamically allocate a rsp, but we are not actually allocating memory
for nvme_completion (rsp->req.rsp). In such a case, accessing pointer
fields (req->rsp->status) in nvmet_req_init() will result in crash.
To fix this, allocate the memory for nvme_completion by calling
nvmet_rdma_alloc_rsp()
Fixes: 8407879c("nvmet-rdma:fix possible bogus dereference under heavy load")
Cc: <stable(a)vger.kernel.org>
Reviewed-by: Max Gurtovoy <maxg(a)mellanox.com>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Signed-off-by: Raju Rangoju <rajur(a)chelsio.com>
Signed-off-by: Sagi Grimberg <sagi(a)grimberg.me>
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
index a8d23eb80192..a884e3a0e8af 100644
--- a/drivers/nvme/target/rdma.c
+++ b/drivers/nvme/target/rdma.c
@@ -139,6 +139,10 @@ static void nvmet_rdma_recv_done(struct ib_cq *cq, struct ib_wc *wc);
static void nvmet_rdma_read_data_done(struct ib_cq *cq, struct ib_wc *wc);
static void nvmet_rdma_qp_event(struct ib_event *event, void *priv);
static void nvmet_rdma_queue_disconnect(struct nvmet_rdma_queue *queue);
+static void nvmet_rdma_free_rsp(struct nvmet_rdma_device *ndev,
+ struct nvmet_rdma_rsp *r);
+static int nvmet_rdma_alloc_rsp(struct nvmet_rdma_device *ndev,
+ struct nvmet_rdma_rsp *r);
static const struct nvmet_fabrics_ops nvmet_rdma_ops;
@@ -182,9 +186,17 @@ nvmet_rdma_get_rsp(struct nvmet_rdma_queue *queue)
spin_unlock_irqrestore(&queue->rsps_lock, flags);
if (unlikely(!rsp)) {
- rsp = kmalloc(sizeof(*rsp), GFP_KERNEL);
+ int ret;
+
+ rsp = kzalloc(sizeof(*rsp), GFP_KERNEL);
if (unlikely(!rsp))
return NULL;
+ ret = nvmet_rdma_alloc_rsp(queue->dev, rsp);
+ if (unlikely(ret)) {
+ kfree(rsp);
+ return NULL;
+ }
+
rsp->allocated = true;
}
@@ -197,6 +209,7 @@ nvmet_rdma_put_rsp(struct nvmet_rdma_rsp *rsp)
unsigned long flags;
if (unlikely(rsp->allocated)) {
+ nvmet_rdma_free_rsp(rsp->queue->dev, rsp);
kfree(rsp);
return;
}
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 5cbab6303b4791a3e6713dfe2c5fda6a867f9adc Mon Sep 17 00:00:00 2001
From: Raju Rangoju <rajur(a)chelsio.com>
Date: Thu, 3 Jan 2019 23:05:31 +0530
Subject: [PATCH] nvmet-rdma: fix null dereference under heavy load
Under heavy load if we don't have any pre-allocated rsps left, we
dynamically allocate a rsp, but we are not actually allocating memory
for nvme_completion (rsp->req.rsp). In such a case, accessing pointer
fields (req->rsp->status) in nvmet_req_init() will result in crash.
To fix this, allocate the memory for nvme_completion by calling
nvmet_rdma_alloc_rsp()
Fixes: 8407879c("nvmet-rdma:fix possible bogus dereference under heavy load")
Cc: <stable(a)vger.kernel.org>
Reviewed-by: Max Gurtovoy <maxg(a)mellanox.com>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Signed-off-by: Raju Rangoju <rajur(a)chelsio.com>
Signed-off-by: Sagi Grimberg <sagi(a)grimberg.me>
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
index a8d23eb80192..a884e3a0e8af 100644
--- a/drivers/nvme/target/rdma.c
+++ b/drivers/nvme/target/rdma.c
@@ -139,6 +139,10 @@ static void nvmet_rdma_recv_done(struct ib_cq *cq, struct ib_wc *wc);
static void nvmet_rdma_read_data_done(struct ib_cq *cq, struct ib_wc *wc);
static void nvmet_rdma_qp_event(struct ib_event *event, void *priv);
static void nvmet_rdma_queue_disconnect(struct nvmet_rdma_queue *queue);
+static void nvmet_rdma_free_rsp(struct nvmet_rdma_device *ndev,
+ struct nvmet_rdma_rsp *r);
+static int nvmet_rdma_alloc_rsp(struct nvmet_rdma_device *ndev,
+ struct nvmet_rdma_rsp *r);
static const struct nvmet_fabrics_ops nvmet_rdma_ops;
@@ -182,9 +186,17 @@ nvmet_rdma_get_rsp(struct nvmet_rdma_queue *queue)
spin_unlock_irqrestore(&queue->rsps_lock, flags);
if (unlikely(!rsp)) {
- rsp = kmalloc(sizeof(*rsp), GFP_KERNEL);
+ int ret;
+
+ rsp = kzalloc(sizeof(*rsp), GFP_KERNEL);
if (unlikely(!rsp))
return NULL;
+ ret = nvmet_rdma_alloc_rsp(queue->dev, rsp);
+ if (unlikely(ret)) {
+ kfree(rsp);
+ return NULL;
+ }
+
rsp->allocated = true;
}
@@ -197,6 +209,7 @@ nvmet_rdma_put_rsp(struct nvmet_rdma_rsp *rsp)
unsigned long flags;
if (unlikely(rsp->allocated)) {
+ nvmet_rdma_free_rsp(rsp->queue->dev, rsp);
kfree(rsp);
return;
}
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 5cbab6303b4791a3e6713dfe2c5fda6a867f9adc Mon Sep 17 00:00:00 2001
From: Raju Rangoju <rajur(a)chelsio.com>
Date: Thu, 3 Jan 2019 23:05:31 +0530
Subject: [PATCH] nvmet-rdma: fix null dereference under heavy load
Under heavy load if we don't have any pre-allocated rsps left, we
dynamically allocate a rsp, but we are not actually allocating memory
for nvme_completion (rsp->req.rsp). In such a case, accessing pointer
fields (req->rsp->status) in nvmet_req_init() will result in crash.
To fix this, allocate the memory for nvme_completion by calling
nvmet_rdma_alloc_rsp()
Fixes: 8407879c("nvmet-rdma:fix possible bogus dereference under heavy load")
Cc: <stable(a)vger.kernel.org>
Reviewed-by: Max Gurtovoy <maxg(a)mellanox.com>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Signed-off-by: Raju Rangoju <rajur(a)chelsio.com>
Signed-off-by: Sagi Grimberg <sagi(a)grimberg.me>
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
index a8d23eb80192..a884e3a0e8af 100644
--- a/drivers/nvme/target/rdma.c
+++ b/drivers/nvme/target/rdma.c
@@ -139,6 +139,10 @@ static void nvmet_rdma_recv_done(struct ib_cq *cq, struct ib_wc *wc);
static void nvmet_rdma_read_data_done(struct ib_cq *cq, struct ib_wc *wc);
static void nvmet_rdma_qp_event(struct ib_event *event, void *priv);
static void nvmet_rdma_queue_disconnect(struct nvmet_rdma_queue *queue);
+static void nvmet_rdma_free_rsp(struct nvmet_rdma_device *ndev,
+ struct nvmet_rdma_rsp *r);
+static int nvmet_rdma_alloc_rsp(struct nvmet_rdma_device *ndev,
+ struct nvmet_rdma_rsp *r);
static const struct nvmet_fabrics_ops nvmet_rdma_ops;
@@ -182,9 +186,17 @@ nvmet_rdma_get_rsp(struct nvmet_rdma_queue *queue)
spin_unlock_irqrestore(&queue->rsps_lock, flags);
if (unlikely(!rsp)) {
- rsp = kmalloc(sizeof(*rsp), GFP_KERNEL);
+ int ret;
+
+ rsp = kzalloc(sizeof(*rsp), GFP_KERNEL);
if (unlikely(!rsp))
return NULL;
+ ret = nvmet_rdma_alloc_rsp(queue->dev, rsp);
+ if (unlikely(ret)) {
+ kfree(rsp);
+ return NULL;
+ }
+
rsp->allocated = true;
}
@@ -197,6 +209,7 @@ nvmet_rdma_put_rsp(struct nvmet_rdma_rsp *rsp)
unsigned long flags;
if (unlikely(rsp->allocated)) {
+ nvmet_rdma_free_rsp(rsp->queue->dev, rsp);
kfree(rsp);
return;
}