This is a note to let you know that I've just added the patch titled
dm: ensure bio submission follows a depth-first tree walk
to the 4.15-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
dm-ensure-bio-submission-follows-a-depth-first-tree-walk.patch
and it can be found in the queue-4.15 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Thu Mar 22 14:03:39 CET 2018
From: NeilBrown <neilb(a)suse.com>
Date: Wed, 6 Sep 2017 09:43:28 +1000
Subject: dm: ensure bio submission follows a depth-first tree walk
From: NeilBrown <neilb(a)suse.com>
[ Upstream commit 18a25da84354c6bb655320de6072c00eda6eb602 ]
A dm device can, in general, represent a tree of targets, each of which
handles a sub-range of the range of blocks handled by the parent.
The bio sequencing managed by generic_make_request() requires that bios
are generated and handled in a depth-first manner. Each call to a
make_request_fn() may submit bios to a single member device, and may
submit bios for a reduced region of the same device as the
make_request_fn.
In particular, any bios submitted to member devices must be expected to
be processed in order, so a later one must never wait for an earlier
one.
This ordering is usually achieved by using bio_split() to reduce a bio
to a size that can be completely handled by one target, and resubmitting
the remainder to the originating device. bio_queue_split() shows the
canonical approach.
dm doesn't follow this approach, largely because it has needed to split
bios since long before bio_split() was available. It currently can
submit bios to separate targets within the one dm_make_request() call.
Dependencies between these targets, as can happen with dm-snap, can
cause deadlocks if either bios gets stuck behind the other in the queues
managed by generic_make_request(). This requires the 'rescue'
functionality provided by dm_offload_{start,end}.
Some of this requirement can be removed by changing the order of bio
submission to follow the canonical approach. That is, if dm finds that
it needs to split a bio, the remainder should be sent to
generic_make_request() rather than being handled immediately. This
delays the handling until the first part is completely processed, so the
deadlock problems do not occur.
__split_and_process_bio() can be called both from dm_make_request() and
from dm_wq_work(). When called from dm_wq_work() the current approach
is perfectly satisfactory as each bio will be processed immediately.
When called from dm_make_request(), current->bio_list will be non-NULL,
and in this case it is best to create a separate "clone" bio for the
remainder.
When we use bio_clone_bioset() to split off the front part of a bio
and chain the two together and submit the remainder to
generic_make_request(), it is important that the newly allocated
bio is used as the head to be processed immediately, and the original
bio gets "bio_advance()"d and sent to generic_make_request() as the
remainder. Otherwise, if the newly allocated bio is used as the
remainder, and if it then needs to be split again, then the next
bio_clone_bioset() call will be made while holding a reference a bio
(result of the first clone) from the same bioset. This can potentially
exhaust the bioset mempool and result in a memory allocation deadlock.
Note that there is no race caused by reassigning cio.io->bio after already
calling __map_bio(). This bio will only be dereferenced again after
dec_pending() has found io->io_count to be zero, and this cannot happen
before the dec_pending() call at the end of __split_and_process_bio().
To provide the clone bio when splitting, we use q->bio_split. This
was previously being freed by bio-based dm to avoid having excess
rescuer threads. As bio_split bio sets no longer create rescuer
threads, there is little cost and much gain from restoring the
q->bio_split bio set.
Signed-off-by: NeilBrown <neilb(a)suse.com>
Signed-off-by: Mike Snitzer <snitzer(a)redhat.com>
Signed-off-by: Sasha Levin <alexander.levin(a)microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/md/dm.c | 33 ++++++++++++++++++++++++---------
1 file changed, 24 insertions(+), 9 deletions(-)
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1499,8 +1499,29 @@ static void __split_and_process_bio(stru
} else {
ci.bio = bio;
ci.sector_count = bio_sectors(bio);
- while (ci.sector_count && !error)
+ while (ci.sector_count && !error) {
error = __split_and_process_non_flush(&ci);
+ if (current->bio_list && ci.sector_count && !error) {
+ /*
+ * Remainder must be passed to generic_make_request()
+ * so that it gets handled *after* bios already submitted
+ * have been completely processed.
+ * We take a clone of the original to store in
+ * ci.io->bio to be used by end_io_acct() and
+ * for dec_pending to use for completion handling.
+ * As this path is not used for REQ_OP_ZONE_REPORT,
+ * the usage of io->bio in dm_remap_zone_report()
+ * won't be affected by this reassignment.
+ */
+ struct bio *b = bio_clone_bioset(bio, GFP_NOIO,
+ md->queue->bio_split);
+ ci.io->bio = b;
+ bio_advance(bio, (bio_sectors(bio) - ci.sector_count) << 9);
+ bio_chain(b, bio);
+ generic_make_request(bio);
+ break;
+ }
+ }
}
/* drop the extra reference count */
@@ -1511,8 +1532,8 @@ static void __split_and_process_bio(stru
*---------------------------------------------------------------*/
/*
- * The request function that just remaps the bio built up by
- * dm_merge_bvec.
+ * The request function that remaps the bio to one target and
+ * splits off any remainder.
*/
static blk_qc_t dm_make_request(struct request_queue *q, struct bio *bio)
{
@@ -2035,12 +2056,6 @@ int dm_setup_md_queue(struct mapped_devi
case DM_TYPE_DAX_BIO_BASED:
dm_init_normal_md_queue(md);
blk_queue_make_request(md->queue, dm_make_request);
- /*
- * DM handles splitting bios as needed. Free the bio_split bioset
- * since it won't be used (saves 1 process per bio-based DM device).
- */
- bioset_free(md->queue->bio_split);
- md->queue->bio_split = NULL;
if (type == DM_TYPE_DAX_BIO_BASED)
queue_flag_set_unlocked(QUEUE_FLAG_DAX, md->queue);
Patches currently in stable-queue which might be from neilb(a)suse.com are
queue-4.15/dm-ensure-bio-submission-follows-a-depth-first-tree-walk.patch
This is a note to let you know that I've just added the patch titled
iio: st_pressure: st_accel: Initialise sensor platform data properly
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
iio-st_pressure-st_accel-initialise-sensor-platform-data-properly.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Thu Mar 22 14:57:32 CET 2018
From: Shrirang Bagul <shrirang.bagul(a)canonical.com>
Date: Wed, 19 Apr 2017 22:05:00 +0800
Subject: iio: st_pressure: st_accel: Initialise sensor platform data properly
From: Shrirang Bagul <shrirang.bagul(a)canonical.com>
[ Upstream commit 7383d44b84c94aaca4bf695a6bd8a69f2295ef1a ]
This patch fixes the sensor platform data initialisation for st_pressure
and st_accel device drivers. Without this patch, the driver fails to
register the sensors when the user removes and re-loads the driver.
1. Unload the kernel modules for st_pressure
$ sudo rmmod st_pressure_i2c
$ sudo rmmod st_pressure
2. Re-load the driver
$ sudo insmod st_pressure
$ sudo insmod st_pressure_i2c
Signed-off-by: Jonathan Cameron <jic23(a)kernel.org>
Acked-by: Linus Walleij <linus.walleij(a)linaro.org>
Signed-off-by: Sasha Levin <alexander.levin(a)microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/iio/accel/st_accel_core.c | 7 ++++---
drivers/iio/pressure/st_pressure_core.c | 8 ++++----
2 files changed, 8 insertions(+), 7 deletions(-)
--- a/drivers/iio/accel/st_accel_core.c
+++ b/drivers/iio/accel/st_accel_core.c
@@ -628,6 +628,8 @@ static const struct iio_trigger_ops st_a
int st_accel_common_probe(struct iio_dev *indio_dev)
{
struct st_sensor_data *adata = iio_priv(indio_dev);
+ struct st_sensors_platform_data *pdata =
+ (struct st_sensors_platform_data *)adata->dev->platform_data;
int irq = adata->get_irq_data_ready(indio_dev);
int err;
@@ -652,9 +654,8 @@ int st_accel_common_probe(struct iio_dev
&adata->sensor_settings->fs.fs_avl[0];
adata->odr = adata->sensor_settings->odr.odr_avl[0].hz;
- if (!adata->dev->platform_data)
- adata->dev->platform_data =
- (struct st_sensors_platform_data *)&default_accel_pdata;
+ if (!pdata)
+ pdata = (struct st_sensors_platform_data *)&default_accel_pdata;
err = st_sensors_init_sensor(indio_dev, adata->dev->platform_data);
if (err < 0)
--- a/drivers/iio/pressure/st_pressure_core.c
+++ b/drivers/iio/pressure/st_pressure_core.c
@@ -436,6 +436,8 @@ static const struct iio_trigger_ops st_p
int st_press_common_probe(struct iio_dev *indio_dev)
{
struct st_sensor_data *press_data = iio_priv(indio_dev);
+ struct st_sensors_platform_data *pdata =
+ (struct st_sensors_platform_data *)press_data->dev->platform_data;
int irq = press_data->get_irq_data_ready(indio_dev);
int err;
@@ -464,10 +466,8 @@ int st_press_common_probe(struct iio_dev
press_data->odr = press_data->sensor_settings->odr.odr_avl[0].hz;
/* Some devices don't support a data ready pin. */
- if (!press_data->dev->platform_data &&
- press_data->sensor_settings->drdy_irq.addr)
- press_data->dev->platform_data =
- (struct st_sensors_platform_data *)&default_press_pdata;
+ if (!pdata && press_data->sensor_settings->drdy_irq.addr)
+ pdata = (struct st_sensors_platform_data *)&default_press_pdata;
err = st_sensors_init_sensor(indio_dev, press_data->dev->platform_data);
if (err < 0)
Patches currently in stable-queue which might be from shrirang.bagul(a)canonical.com are
queue-4.4/iio-st_pressure-st_accel-initialise-sensor-platform-data-properly.patch
This is a note to let you know that I've just added the patch titled
infiniband/uverbs: Fix integer overflows
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
infiniband-uverbs-fix-integer-overflows.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Thu Mar 22 14:57:32 CET 2018
From: Vlad Tsyrklevich <vlad(a)tsyrklevich.net>
Date: Fri, 24 Mar 2017 15:55:17 -0400
Subject: infiniband/uverbs: Fix integer overflows
From: Vlad Tsyrklevich <vlad(a)tsyrklevich.net>
[ Upstream commit 4f7f4dcfff2c19debbcdbcc861c325610a15e0c5 ]
The 'num_sge' variable is verfied to be smaller than the 'sge_count'
variable; however, since both are user-controlled it's possible to cause
an integer overflow for the kmalloc multiply on 32-bit platforms
(num_sge and sge_count are both defined u32). By crafting an input that
causes a smaller-than-expected allocation it's possible to write
controlled data out-of-bounds.
Signed-off-by: Vlad Tsyrklevich <vlad(a)tsyrklevich.net>
Signed-off-by: Doug Ledford <dledford(a)redhat.com>
Signed-off-by: Sasha Levin <alexander.levin(a)microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/infiniband/core/uverbs_cmd.c | 13 ++++++++++++-
1 file changed, 12 insertions(+), 1 deletion(-)
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -2436,9 +2436,13 @@ ssize_t ib_uverbs_destroy_qp(struct ib_u
static void *alloc_wr(size_t wr_size, __u32 num_sge)
{
+ if (num_sge >= (U32_MAX - ALIGN(wr_size, sizeof (struct ib_sge))) /
+ sizeof (struct ib_sge))
+ return NULL;
+
return kmalloc(ALIGN(wr_size, sizeof (struct ib_sge)) +
num_sge * sizeof (struct ib_sge), GFP_KERNEL);
-};
+}
ssize_t ib_uverbs_post_send(struct ib_uverbs_file *file,
struct ib_device *ib_dev,
@@ -2664,6 +2668,13 @@ static struct ib_recv_wr *ib_uverbs_unma
ret = -EINVAL;
goto err;
}
+
+ if (user_wr->num_sge >=
+ (U32_MAX - ALIGN(sizeof *next, sizeof (struct ib_sge))) /
+ sizeof (struct ib_sge)) {
+ ret = -EINVAL;
+ goto err;
+ }
next = kmalloc(ALIGN(sizeof *next, sizeof (struct ib_sge)) +
user_wr->num_sge * sizeof (struct ib_sge),
Patches currently in stable-queue which might be from vlad(a)tsyrklevich.net are
queue-4.4/infiniband-uverbs-fix-integer-overflows.patch
This is a note to let you know that I've just added the patch titled
IB/mlx4: Take write semaphore when changing the vma struct
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
ib-mlx4-take-write-semaphore-when-changing-the-vma-struct.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Thu Mar 22 14:57:32 CET 2018
From: Maor Gottlieb <maorg(a)mellanox.com>
Date: Wed, 29 Mar 2017 06:03:00 +0300
Subject: IB/mlx4: Take write semaphore when changing the vma struct
From: Maor Gottlieb <maorg(a)mellanox.com>
[ Upstream commit 22c3653d04bd0c67b75e99d85e0c0bdf83947df5 ]
When the driver disassociate user context, it changes the vma to
anonymous by setting the vm_ops to null and zap the vma ptes.
In order to avoid race in the kernel, we need to take write lock
before we change the vma entries.
Fixes: ae184ddeca5db ('IB/mlx4_ib: Disassociate support')
Signed-off-by: Maor Gottlieb <maorg(a)mellanox.com>
Signed-off-by: Leon Romanovsky <leon(a)kernel.org>
Signed-off-by: Doug Ledford <dledford(a)redhat.com>
Signed-off-by: Sasha Levin <alexander.levin(a)microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/infiniband/hw/mlx4/main.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -1041,7 +1041,7 @@ static void mlx4_ib_disassociate_ucontex
/* need to protect from a race on closing the vma as part of
* mlx4_ib_vma_close().
*/
- down_read(&owning_mm->mmap_sem);
+ down_write(&owning_mm->mmap_sem);
for (i = 0; i < HW_BAR_COUNT; i++) {
vma = context->hw_bar_info[i].vma;
if (!vma)
@@ -1059,7 +1059,7 @@ static void mlx4_ib_disassociate_ucontex
context->hw_bar_info[i].vma->vm_ops = NULL;
}
- up_read(&owning_mm->mmap_sem);
+ up_write(&owning_mm->mmap_sem);
mmput(owning_mm);
put_task_struct(owning_process);
}
Patches currently in stable-queue which might be from maorg(a)mellanox.com are
queue-4.4/ib-mlx4-change-vma-from-shared-to-private.patch
queue-4.4/ib-mlx4-take-write-semaphore-when-changing-the-vma-struct.patch
This is a note to let you know that I've just added the patch titled
IB/umem: Fix use of npages/nmap fields
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
ib-umem-fix-use-of-npages-nmap-fields.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Thu Mar 22 14:57:33 CET 2018
From: Artemy Kovalyov <artemyko(a)mellanox.com>
Date: Tue, 14 Nov 2017 14:51:59 +0200
Subject: IB/umem: Fix use of npages/nmap fields
From: Artemy Kovalyov <artemyko(a)mellanox.com>
[ Upstream commit edf1a84fe37c51290e2c88154ecaf48dadff3d27 ]
In ib_umem structure npages holds original number of sg entries, while
nmap is number of DMA blocks returned by dma_map_sg.
Fixes: c5d76f130b28 ('IB/core: Add umem function to read data from user-space')
Signed-off-by: Artemy Kovalyov <artemyko(a)mellanox.com>
Signed-off-by: Leon Romanovsky <leon(a)kernel.org>
Signed-off-by: Jason Gunthorpe <jgg(a)mellanox.com>
Signed-off-by: Sasha Levin <alexander.levin(a)microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/infiniband/core/umem.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/infiniband/core/umem.c
+++ b/drivers/infiniband/core/umem.c
@@ -354,7 +354,7 @@ int ib_umem_copy_from(void *dst, struct
return -EINVAL;
}
- ret = sg_pcopy_to_buffer(umem->sg_head.sgl, umem->nmap, dst, length,
+ ret = sg_pcopy_to_buffer(umem->sg_head.sgl, umem->npages, dst, length,
offset + ib_umem_offset(umem));
if (ret < 0)
Patches currently in stable-queue which might be from artemyko(a)mellanox.com are
queue-4.4/ib-umem-fix-use-of-npages-nmap-fields.patch
This is a note to let you know that I've just added the patch titled
IB/ipoib: Update broadcast object if PKey value was changed in index 0
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
ib-ipoib-update-broadcast-object-if-pkey-value-was-changed-in-index-0.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Thu Mar 22 14:57:32 CET 2018
From: Feras Daoud <ferasda(a)mellanox.com>
Date: Sun, 19 Mar 2017 11:18:54 +0200
Subject: IB/ipoib: Update broadcast object if PKey value was changed in index 0
From: Feras Daoud <ferasda(a)mellanox.com>
[ Upstream commit 9a9b8112699d78e7f317019b37f377e90023f3ed ]
Update the broadcast address in the priv->broadcast object when the
Pkey value changes in index 0, otherwise the multicast GID value will
keep the previous value of the PKey, and will not be updated.
This leads to interface state down because the interface will keep the
old PKey value.
For example, in SR-IOV environment, if the PF changes the value of PKey
index 0 for one of the VFs, then the VF receives PKey change event that
triggers heavy flush. This flush calls update_parent_pkey that update the
broadcast object and its relevant members. If in this case the multicast
GID will not be updated, the interface state will be down.
Fixes: c2904141696e ("IPoIB: Fix pkey change flow for virtualization environments")
Signed-off-by: Feras Daoud <ferasda(a)mellanox.com>
Signed-off-by: Erez Shitrit <erezsh(a)mellanox.com>
Reviewed-by: Alex Vesker <valex(a)mellanox.com>
Signed-off-by: Leon Romanovsky <leon(a)kernel.org>
Signed-off-by: Doug Ledford <dledford(a)redhat.com>
Signed-off-by: Sasha Levin <alexander.levin(a)microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/infiniband/ulp/ipoib/ipoib_ib.c | 13 +++++++++++++
1 file changed, 13 insertions(+)
--- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
@@ -945,6 +945,19 @@ static inline int update_parent_pkey(str
*/
priv->dev->broadcast[8] = priv->pkey >> 8;
priv->dev->broadcast[9] = priv->pkey & 0xff;
+
+ /*
+ * Update the broadcast address in the priv->broadcast object,
+ * in case it already exists, otherwise no one will do that.
+ */
+ if (priv->broadcast) {
+ spin_lock_irq(&priv->lock);
+ memcpy(priv->broadcast->mcmember.mgid.raw,
+ priv->dev->broadcast + 4,
+ sizeof(union ib_gid));
+ spin_unlock_irq(&priv->lock);
+ }
+
return 0;
}
Patches currently in stable-queue which might be from ferasda(a)mellanox.com are
queue-4.4/ib-ipoib-fix-deadlock-between-ipoib_stop-and-mcast-join-flow.patch
queue-4.4/ib-ipoib-update-broadcast-object-if-pkey-value-was-changed-in-index-0.patch
This is a note to let you know that I've just added the patch titled
IB/mlx4: Change vma from shared to private
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
ib-mlx4-change-vma-from-shared-to-private.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Thu Mar 22 14:57:32 CET 2018
From: Maor Gottlieb <maorg(a)mellanox.com>
Date: Wed, 29 Mar 2017 06:03:01 +0300
Subject: IB/mlx4: Change vma from shared to private
From: Maor Gottlieb <maorg(a)mellanox.com>
[ Upstream commit ca37a664a8e4e9988b220988ceb4d79e3316f195 ]
Anonymous VMA (->vm_ops == NULL) cannot be shared, otherwise
it would lead to SIGBUS.
Remove the shared flags from the vma after we change it to be
anonymous.
This is easily reproduced by doing modprobe -r while running a
user-space application such as raw_ethernet_bw.
Fixes: ae184ddeca5db ('IB/mlx4_ib: Disassociate support')
Signed-off-by: Maor Gottlieb <maorg(a)mellanox.com>
Signed-off-by: Leon Romanovsky <leon(a)kernel.org>
Signed-off-by: Doug Ledford <dledford(a)redhat.com>
Signed-off-by: Sasha Levin <alexander.levin(a)microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/infiniband/hw/mlx4/main.c | 2 ++
1 file changed, 2 insertions(+)
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -1055,6 +1055,8 @@ static void mlx4_ib_disassociate_ucontex
BUG_ON(1);
}
+ context->hw_bar_info[i].vma->vm_flags &=
+ ~(VM_SHARED | VM_MAYSHARE);
/* context going to be destroyed, should not access ops any more */
context->hw_bar_info[i].vma->vm_ops = NULL;
}
Patches currently in stable-queue which might be from maorg(a)mellanox.com are
queue-4.4/ib-mlx4-change-vma-from-shared-to-private.patch
queue-4.4/ib-mlx4-take-write-semaphore-when-changing-the-vma-struct.patch
This is a note to let you know that I've just added the patch titled
IB/ipoib: Fix deadlock between ipoib_stop and mcast join flow
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
ib-ipoib-fix-deadlock-between-ipoib_stop-and-mcast-join-flow.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Thu Mar 22 14:57:32 CET 2018
From: Feras Daoud <ferasda(a)mellanox.com>
Date: Sun, 19 Mar 2017 11:18:55 +0200
Subject: IB/ipoib: Fix deadlock between ipoib_stop and mcast join flow
From: Feras Daoud <ferasda(a)mellanox.com>
[ Upstream commit 3e31a490e01a6e67cbe9f6e1df2f3ff0fbf48972 ]
Before calling ipoib_stop, rtnl_lock should be taken, then
the flow clears the IPOIB_FLAG_ADMIN_UP and IPOIB_FLAG_OPER_UP
flags, and waits for mcast completion if IPOIB_MCAST_FLAG_BUSY
is set.
On the other hand, the flow of multicast join task initializes
a mcast completion, sets the IPOIB_MCAST_FLAG_BUSY and calls
ipoib_mcast_join. If IPOIB_FLAG_OPER_UP flag is not set, this
call returns EINVAL without setting the mcast completion and
leads to a deadlock.
ipoib_stop |
| |
clear_bit(IPOIB_FLAG_ADMIN_UP) |
| |
Context Switch |
| ipoib_mcast_join_task
| |
| spin_lock_irq(lock)
| |
| init_completion(mcast)
| |
| set_bit(IPOIB_MCAST_FLAG_BUSY)
| |
| Context Switch
| |
clear_bit(IPOIB_FLAG_OPER_UP) |
| |
spin_lock_irqsave(lock) |
| |
Context Switch |
| ipoib_mcast_join
| return (-EINVAL)
| |
| spin_unlock_irq(lock)
| |
| Context Switch
| |
ipoib_mcast_dev_flush |
wait_for_completion(mcast) |
ipoib_stop will wait for mcast completion for ever, and will
not release the rtnl_lock. As a result panic occurs with the
following trace:
[13441.639268] Call Trace:
[13441.640150] [<ffffffff8168b579>] schedule+0x29/0x70
[13441.641038] [<ffffffff81688fc9>] schedule_timeout+0x239/0x2d0
[13441.641914] [<ffffffff810bc017>] ? complete+0x47/0x50
[13441.642765] [<ffffffff810a690d>] ? flush_workqueue_prep_pwqs+0x16d/0x200
[13441.643580] [<ffffffff8168b956>] wait_for_completion+0x116/0x170
[13441.644434] [<ffffffff810c4ec0>] ? wake_up_state+0x20/0x20
[13441.645293] [<ffffffffa05af170>] ipoib_mcast_dev_flush+0x150/0x190 [ib_ipoib]
[13441.646159] [<ffffffffa05ac967>] ipoib_ib_dev_down+0x37/0x60 [ib_ipoib]
[13441.647013] [<ffffffffa05a4805>] ipoib_stop+0x75/0x150 [ib_ipoib]
Fixes: 08bc327629cb ("IB/ipoib: fix for rare multicast join race condition")
Signed-off-by: Feras Daoud <ferasda(a)mellanox.com>
Signed-off-by: Leon Romanovsky <leon(a)kernel.org>
Signed-off-by: Doug Ledford <dledford(a)redhat.com>
Signed-off-by: Sasha Levin <alexander.levin(a)microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 11 +++++------
1 file changed, 5 insertions(+), 6 deletions(-)
--- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
@@ -473,6 +473,9 @@ static int ipoib_mcast_join(struct net_d
!test_bit(IPOIB_FLAG_OPER_UP, &priv->flags))
return -EINVAL;
+ init_completion(&mcast->done);
+ set_bit(IPOIB_MCAST_FLAG_BUSY, &mcast->flags);
+
ipoib_dbg_mcast(priv, "joining MGID %pI6\n", mcast->mcmember.mgid.raw);
rec.mgid = mcast->mcmember.mgid;
@@ -631,8 +634,6 @@ void ipoib_mcast_join_task(struct work_s
if (mcast->backoff == 1 ||
time_after_eq(jiffies, mcast->delay_until)) {
/* Found the next unjoined group */
- init_completion(&mcast->done);
- set_bit(IPOIB_MCAST_FLAG_BUSY, &mcast->flags);
if (ipoib_mcast_join(dev, mcast)) {
spin_unlock_irq(&priv->lock);
return;
@@ -652,11 +653,9 @@ out:
queue_delayed_work(priv->wq, &priv->mcast_task,
delay_until - jiffies);
}
- if (mcast) {
- init_completion(&mcast->done);
- set_bit(IPOIB_MCAST_FLAG_BUSY, &mcast->flags);
+ if (mcast)
ipoib_mcast_join(dev, mcast);
- }
+
spin_unlock_irq(&priv->lock);
}
Patches currently in stable-queue which might be from ferasda(a)mellanox.com are
queue-4.4/ib-ipoib-fix-deadlock-between-ipoib_stop-and-mcast-join-flow.patch
queue-4.4/ib-ipoib-update-broadcast-object-if-pkey-value-was-changed-in-index-0.patch
This is a note to let you know that I've just added the patch titled
IB/ipoib: Avoid memory leak if the SA returns a different DGID
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
ib-ipoib-avoid-memory-leak-if-the-sa-returns-a-different-dgid.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Thu Mar 22 14:57:33 CET 2018
From: Erez Shitrit <erezsh(a)mellanox.com>
Date: Tue, 14 Nov 2017 14:51:53 +0200
Subject: IB/ipoib: Avoid memory leak if the SA returns a different DGID
From: Erez Shitrit <erezsh(a)mellanox.com>
[ Upstream commit 439000892ee17a9c92f1e4297818790ef8bb4ced ]
The ipoib path database is organized around DGIDs from the LLADDR, but the
SA is free to return a different GID when asked for path. This causes a
bug because the SA's modified DGID is copied into the database key, even
though it is no longer the correct lookup key, causing a memory leak and
other malfunctions.
Ensure the database key does not change after the SA query completes.
Demonstration of the bug is as follows
ipoib wants to send to GID fe80:0000:0000:0000:0002:c903:00ef:5ee2, it
creates new record in the DB with that gid as a key, and issues a new
request to the SM.
Now, the SM from some reason returns path-record with other SGID (for
example, 2001:0000:0000:0000:0002:c903:00ef:5ee2 that contains the local
subnet prefix) now ipoib will overwrite the current entry with the new
one, and if new request to the original GID arrives ipoib will not find
it in the DB (was overwritten) and will create new record that in its
turn will also be overwritten by the response from the SM, and so on
till the driver eats all the device memory.
Signed-off-by: Erez Shitrit <erezsh(a)mellanox.com>
Signed-off-by: Leon Romanovsky <leon(a)kernel.org>
Signed-off-by: Jason Gunthorpe <jgg(a)mellanox.com>
Signed-off-by: Sasha Levin <alexander.levin(a)microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/infiniband/ulp/ipoib/ipoib_main.c | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -724,6 +724,22 @@ static void path_rec_completion(int stat
spin_lock_irqsave(&priv->lock, flags);
if (!IS_ERR_OR_NULL(ah)) {
+ /*
+ * pathrec.dgid is used as the database key from the LLADDR,
+ * it must remain unchanged even if the SA returns a different
+ * GID to use in the AH.
+ */
+ if (memcmp(pathrec->dgid.raw, path->pathrec.dgid.raw,
+ sizeof(union ib_gid))) {
+ ipoib_dbg(
+ priv,
+ "%s got PathRec for gid %pI6 while asked for %pI6\n",
+ dev->name, pathrec->dgid.raw,
+ path->pathrec.dgid.raw);
+ memcpy(pathrec->dgid.raw, path->pathrec.dgid.raw,
+ sizeof(union ib_gid));
+ }
+
path->pathrec = *pathrec;
old_ah = path->ah;
Patches currently in stable-queue which might be from erezsh(a)mellanox.com are
queue-4.4/ib-ipoib-update-broadcast-object-if-pkey-value-was-changed-in-index-0.patch
queue-4.4/ib-ipoib-avoid-memory-leak-if-the-sa-returns-a-different-dgid.patch
This is a note to let you know that I've just added the patch titled
ia64: fix module loading for gcc-5.4
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
ia64-fix-module-loading-for-gcc-5.4.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Thu Mar 22 14:57:32 CET 2018
From: Sergei Trofimovich <slyfox(a)gentoo.org>
Date: Mon, 1 May 2017 11:51:55 -0700
Subject: ia64: fix module loading for gcc-5.4
From: Sergei Trofimovich <slyfox(a)gentoo.org>
[ Upstream commit a25fb8508c1b80dce742dbeaa4d75a1e9f2c5617 ]
Starting from gcc-5.4+ gcc generates MLX instructions in more cases to
refer local symbols:
https://gcc.gnu.org/PR60465
That caused ia64 module loader to choke on such instructions:
fuse: invalid slot number 1 for IMM64
The Linux kernel used to handle only case where relocation pointed to
slot=2 instruction in the bundle. That limitation was fixed in linux by
commit 9c184a073bfd ("[IA64] Fix 2.6 kernel for the new ia64 assembler")
See
http://sources.redhat.com/bugzilla/show_bug.cgi?id=1433
This change lifts the slot=2 restriction from the kernel module loader.
Tested on 'fuse' and 'btrfs' kernel modules.
Cc: Markus Elfring <elfring(a)users.sourceforge.net>
Cc: H J Lu <hjl.tools(a)gmail.com>
Cc: Fenghua Yu <fenghua.yu(a)intel.com>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Bug: https://bugs.gentoo.org/601014
Tested-by: Émeric MASCHINO <emeric.maschino(a)gmail.com>
Signed-off-by: Sergei Trofimovich <slyfox(a)gentoo.org>
Signed-off-by: Tony Luck <tony.luck(a)intel.com>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Signed-off-by: Sasha Levin <alexander.levin(a)microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/ia64/kernel/module.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- a/arch/ia64/kernel/module.c
+++ b/arch/ia64/kernel/module.c
@@ -153,7 +153,7 @@ slot (const struct insn *insn)
static int
apply_imm64 (struct module *mod, struct insn *insn, uint64_t val)
{
- if (slot(insn) != 2) {
+ if (slot(insn) != 1 && slot(insn) != 2) {
printk(KERN_ERR "%s: invalid slot number %d for IMM64\n",
mod->name, slot(insn));
return 0;
@@ -165,7 +165,7 @@ apply_imm64 (struct module *mod, struct
static int
apply_imm60 (struct module *mod, struct insn *insn, uint64_t val)
{
- if (slot(insn) != 2) {
+ if (slot(insn) != 1 && slot(insn) != 2) {
printk(KERN_ERR "%s: invalid slot number %d for IMM60\n",
mod->name, slot(insn));
return 0;
Patches currently in stable-queue which might be from slyfox(a)gentoo.org are
queue-4.4/ia64-fix-module-loading-for-gcc-5.4.patch