The patch titled
Subject: mm: migrate: don't rely on PageMovable() of newpage after unlocking it
has been added to the -mm tree. Its filename is
mm-migrate-dont-rely-on-pagemovable-of-newpage-after-unlocking-it.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/mm-migrate-dont-rely-on-pagemovabl…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/mm-migrate-dont-rely-on-pagemovabl…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: David Hildenbrand <david(a)redhat.com>
Subject: mm: migrate: don't rely on PageMovable() of newpage after unlocking it
While debugging some crashes related to virtio-balloon deflation that
happened under the old balloon migration code, I stumbled over a race that
still exists today.
What we experienced:
drivers/virtio/virtio_balloon.c:release_pages_balloon():
- WARNING: CPU: 13 PID: 6586 at lib/list_debug.c:59 __list_del_entry+0xa1/0xd0
- list_del corruption. prev->next should be ffffe253961090a0, but was dead000000000100
Turns out after having added the page to a local list when dequeuing, the
page would suddenly be moved to an LRU list before we would free it via
the local list, corrupting both lists. So a page we own and that is !LRU
was moved to an LRU list.
In __unmap_and_move(), we lock the old and newpage and perform the
migration. In case of vitio-balloon, the new page will become movable,
the old page will no longer be movable.
However, after unlocking newpage, there is nothing stopping the newpage
from getting dequeued and freed by virtio-balloon. This
will result in the newpage
1. No longer having PageMovable()
2. Getting moved to the local list before finally freeing it (using
page->lru)
Back in the migration thread in __unmap_and_move(), we would after
unlocking the newpage suddenly no longer have PageMovable(newpage) and
will therefore call putback_lru_page(newpage), modifying page->lru
although that list is still in use by virtio-balloon.
To summarize, we have a race between migrating the newpage and checking
for PageMovable(newpage). Instead of checking PageMovable(newpage), we
can simply rely on is_lru of the original page.
Looks like this was introduced by d6d86c0a7f8d ("mm/balloon_compaction:
redesign ballooned pages management"), which was backported up to 3.12.
Old compaction code used PageBalloon() via -_is_movable_balloon_page()
instead of PageMovable(), however with the same semantics.
Link: http://lkml.kernel.org/r/20190128160403.16657-1-david@redhat.com
Fixes: d6d86c0a7f8d ("mm/balloon_compaction: redesign ballooned pages management")
Signed-off-by: David Hildenbrand <david(a)redhat.com>
Reported-by: Vratislav Bendel <vbendel(a)redhat.com>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Acked-by: Rafael Aquini <aquini(a)redhat.com>
Cc: Mel Gorman <mgorman(a)techsingularity.net>
Cc: "Kirill A. Shutemov" <kirill.shutemov(a)linux.intel.com>
Cc: Naoya Horiguchi <n-horiguchi(a)ah.jp.nec.com>
Cc: Jan Kara <jack(a)suse.cz>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Dominik Brodowski <linux(a)dominikbrodowski.net>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Konstantin Khlebnikov <k.khlebnikov(a)samsung.com>
Cc: Minchan Kim <minchan(a)kernel.org>
Cc: <stable(a)vger.kernel.org> [3.12+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/migrate.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
--- a/mm/migrate.c~mm-migrate-dont-rely-on-pagemovable-of-newpage-after-unlocking-it
+++ a/mm/migrate.c
@@ -1135,10 +1135,12 @@ out:
* If migration is successful, decrease refcount of the newpage
* which will not free the page because new page owner increased
* refcounter. As well, if it is LRU page, add the page to LRU
- * list in here.
+ * list in here. Don't rely on PageMovable(newpage), as that could
+ * already have changed after unlocking newpage (e.g.
+ * virtio-balloon deflation).
*/
if (rc == MIGRATEPAGE_SUCCESS) {
- if (unlikely(__PageMovable(newpage)))
+ if (unlikely(!is_lru))
put_page(newpage);
else
putback_lru_page(newpage);
_
Patches currently in -mm which might be from david(a)redhat.com are
mm-balloon-update-comment-about-isolation-migration-compaction.patch
mm-convert-pg_balloon-to-pg_offline.patch
kexec-export-pg_offline-to-vmcoreinfo.patch
xen-balloon-mark-inflated-pages-pg_offline.patch
hv_balloon-mark-inflated-pages-pg_offline.patch
vmw_balloon-mark-inflated-pages-pg_offline.patch
vmw_balloon-mark-inflated-pages-pg_offline-v2.patch
pm-hibernate-use-pfn_to_online_page.patch
pm-hibernate-exclude-all-pageoffline-pages.patch
pm-hibernate-exclude-all-pageoffline-pages-v2.patch
mm-migrate-dont-rely-on-pagemovable-of-newpage-after-unlocking-it.patch
The patch below does not apply to the 4.20-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From a1e1cb72d96491277ede8d257ce6b48a381dd336 Mon Sep 17 00:00:00 2001
From: Mike Snitzer <snitzer(a)redhat.com>
Date: Thu, 17 Jan 2019 10:48:01 -0500
Subject: [PATCH] dm: fix redundant IO accounting for bios that need splitting
The risk of redundant IO accounting was not taken into consideration
when commit 18a25da84354 ("dm: ensure bio submission follows a
depth-first tree walk") introduced IO splitting in terms of recursion
via generic_make_request().
Fix this by subtracting the split bio's payload from the IO stats that
were already accounted for by start_io_acct() upon dm_make_request()
entry. This repeat oscillation of the IO accounting, up then down,
isn't ideal but refactoring DM core's IO splitting to pre-split bios
_before_ they are accounted turned out to be an excessive amount of
change that will need a full development cycle to refine and verify.
Before this fix:
/dev/mapper/stripe_dev is a 4-way stripe using a 32k chunksize, so
bios are split on 32k boundaries.
# fio --name=16M --filename=/dev/mapper/stripe_dev --rw=write --bs=64k --size=16M \
--iodepth=1 --ioengine=libaio --direct=1 --refill_buffers
with debugging added:
[103898.310264] device-mapper: core: start_io_acct: dm-2 WRITE bio->bi_iter.bi_sector=0 len=128
[103898.318704] device-mapper: core: __split_and_process_bio: recursing for following split bio:
[103898.329136] device-mapper: core: start_io_acct: dm-2 WRITE bio->bi_iter.bi_sector=64 len=64
...
16M written yet 136M (278528 * 512b) accounted:
# cat /sys/block/dm-2/stat | awk '{ print $7 }'
278528
After this fix:
16M written and 16M (32768 * 512b) accounted:
# cat /sys/block/dm-2/stat | awk '{ print $7 }'
32768
Fixes: 18a25da84354 ("dm: ensure bio submission follows a depth-first tree walk")
Cc: stable(a)vger.kernel.org # 4.16+
Reported-by: Bryan Gurney <bgurney(a)redhat.com>
Reviewed-by: Ming Lei <ming.lei(a)redhat.com>
Signed-off-by: Mike Snitzer <snitzer(a)redhat.com>
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index fcb97b0a5743..fbadda68e23b 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1584,6 +1584,9 @@ static void init_clone_info(struct clone_info *ci, struct mapped_device *md,
ci->sector = bio->bi_iter.bi_sector;
}
+#define __dm_part_stat_sub(part, field, subnd) \
+ (part_stat_get(part, field) -= (subnd))
+
/*
* Entry point to split a bio into clones and submit them to the targets.
*/
@@ -1638,6 +1641,19 @@ static blk_qc_t __split_and_process_bio(struct mapped_device *md,
struct bio *b = bio_split(bio, bio_sectors(bio) - ci.sector_count,
GFP_NOIO, &md->queue->bio_split);
ci.io->orig_bio = b;
+
+ /*
+ * Adjust IO stats for each split, otherwise upon queue
+ * reentry there will be redundant IO accounting.
+ * NOTE: this is a stop-gap fix, a proper fix involves
+ * significant refactoring of DM core's bio splitting
+ * (by eliminating DM's splitting and just using bio_split)
+ */
+ part_stat_lock();
+ __dm_part_stat_sub(&dm_disk(md)->part0,
+ sectors[op_stat_group(bio_op(bio))], ci.sector_count);
+ part_stat_unlock();
+
bio_chain(b, bio);
ret = generic_make_request(bio);
break;
From: "Gustavo A. R. Silva" <gustavo(a)embeddedor.com>
[ Upstream commit a37805098900a6e73a55b3a43b7d3bcd987bb3f4 ]
idx can be indirectly controlled by user-space, hence leading to a
potential exploitation of the Spectre variant 1 vulnerability.
This issue was detected with the help of Smatch:
drivers/gpu/drm/drm_bufs.c:1420 drm_legacy_freebufs() warn: potential
spectre issue 'dma->buflist' [r] (local cap)
Fix this by sanitizing idx before using it to index dma->buflist
Notice that given that speculation windows are large, the policy is
to kill the speculation on the first load and not worry if it can be
completed with a dependent load/store [1].
[1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2
Signed-off-by: Gustavo A. R. Silva <gustavo(a)embeddedor.com>
Signed-off-by: Daniel Vetter <daniel.vetter(a)ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20181016095549.GA23586@embedd…
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/gpu/drm/drm_bufs.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/drm_bufs.c b/drivers/gpu/drm/drm_bufs.c
index f1a204d253cc..ac22b8d86249 100644
--- a/drivers/gpu/drm/drm_bufs.c
+++ b/drivers/gpu/drm/drm_bufs.c
@@ -36,6 +36,8 @@
#include <drm/drmP.h>
#include "drm_legacy.h"
+#include <linux/nospec.h>
+
static struct drm_map_list *drm_find_matching_map(struct drm_device *dev,
struct drm_local_map *map)
{
@@ -1332,6 +1334,7 @@ int drm_legacy_freebufs(struct drm_device *dev, void *data,
idx, dma->buf_count - 1);
return -EINVAL;
}
+ idx = array_index_nospec(idx, dma->buf_count);
buf = dma->buflist[idx];
if (buf->file_priv != file_priv) {
DRM_ERROR("Process %d freeing buffer not owned\n",
--
2.19.1
From: "Gustavo A. R. Silva" <gustavo(a)embeddedor.com>
[ Upstream commit a37805098900a6e73a55b3a43b7d3bcd987bb3f4 ]
idx can be indirectly controlled by user-space, hence leading to a
potential exploitation of the Spectre variant 1 vulnerability.
This issue was detected with the help of Smatch:
drivers/gpu/drm/drm_bufs.c:1420 drm_legacy_freebufs() warn: potential
spectre issue 'dma->buflist' [r] (local cap)
Fix this by sanitizing idx before using it to index dma->buflist
Notice that given that speculation windows are large, the policy is
to kill the speculation on the first load and not worry if it can be
completed with a dependent load/store [1].
[1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2
Signed-off-by: Gustavo A. R. Silva <gustavo(a)embeddedor.com>
Signed-off-by: Daniel Vetter <daniel.vetter(a)ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20181016095549.GA23586@embedd…
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/gpu/drm/drm_bufs.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/drm_bufs.c b/drivers/gpu/drm/drm_bufs.c
index adb1dd7fde5f..9ccd7d702cd3 100644
--- a/drivers/gpu/drm/drm_bufs.c
+++ b/drivers/gpu/drm/drm_bufs.c
@@ -36,6 +36,8 @@
#include <drm/drmP.h>
#include "drm_legacy.h"
+#include <linux/nospec.h>
+
static struct drm_map_list *drm_find_matching_map(struct drm_device *dev,
struct drm_local_map *map)
{
@@ -1413,6 +1415,7 @@ int drm_legacy_freebufs(struct drm_device *dev, void *data,
idx, dma->buf_count - 1);
return -EINVAL;
}
+ idx = array_index_nospec(idx, dma->buf_count);
buf = dma->buflist[idx];
if (buf->file_priv != file_priv) {
DRM_ERROR("Process %d freeing buffer not owned\n",
--
2.19.1
Hello Sasha,
On Wed, Jan 23, 2019 at 10:57:44PM +0000, Sasha Levin wrote:
> [This is an automated email]
Not sure if I only state the obvious that was just missed by your
automation.
>
> This commit has been processed because it contains a "Fixes:" tag,
> fixing commit: cbffaf7aa09e can: flexcan: Always use last mailbox for TX.
>
> The bot has tested the following trees: v4.20.3, v4.19.16.
>
> v4.20.3: Failed to apply! Possible dependencies:
> 0517961ccdf1 ("can: flexcan: Add provision for variable payload size")
> 22233f7bf2c9 ("can: flexcan: FLEXCAN_IFLAG_MB: add () around macro argument")
> 5156c7b11f35 ("can: flexcan: move rx_offload_add() from flexcan_probe() to flexcan_open()")
> 7ad0f53a394b ("can: flexcan: flexcan_chip_start(): enable loopback mode in flexcan")
> de3578c198c6 ("can: flexcan: add self wakeup support")
>
> v4.19.16: Failed to apply! Possible dependencies:
> 0517961ccdf1 ("can: flexcan: Add provision for variable payload size")
> 22233f7bf2c9 ("can: flexcan: FLEXCAN_IFLAG_MB: add () around macro argument")
> 5156c7b11f35 ("can: flexcan: move rx_offload_add() from flexcan_probe() to flexcan_open()")
> 7ad0f53a394b ("can: flexcan: flexcan_chip_start(): enable loopback mode in flexcan")
> de3578c198c6 ("can: flexcan: add self wakeup support")
>
>
> How should we proceed with this patch?
When I posted the commit I also posted one that fits on top of v4.19.x.
I would expect that the same also fits on v4.20.x. If not, tell me.
Best regards
Uwe
--
Pengutronix e.K. | Uwe Kleine-König |
Industrial Linux Solutions | http://www.pengutronix.de/ |
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From a55234dabe1f72cf22f9197980751d37e38ba020 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Uwe=20Kleine-K=C3=B6nig?= <u.kleine-koenig(a)pengutronix.de>
Date: Fri, 11 Jan 2019 12:20:41 +0100
Subject: [PATCH] can: flexcan: fix NULL pointer exception during bringup
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Commit cbffaf7aa09e ("can: flexcan: Always use last mailbox for TX")
introduced a loop letting i run up to (including) ARRAY_SIZE(regs->mb)
and in the body accessed regs->mb[i] which is an out-of-bounds array
access that then resulted in an access to an reserved register area.
Later this was changed by commit 0517961ccdf1 ("can: flexcan: Add
provision for variable payload size") to iterate a bit differently but
still runs one iteration too much resulting to call
flexcan_get_mb(priv, priv->mb_count)
which results in a WARN_ON and then a NULL pointer exception. This
only affects devices compatible with "fsl,p1010-flexcan",
"fsl,imx53-flexcan", "fsl,imx35-flexcan", "fsl,imx25-flexcan",
"fsl,imx28-flexcan", so newer i.MX SoCs are not affected.
Fixes: cbffaf7aa09e ("can: flexcan: Always use last mailbox for TX")
Signed-off-by: Uwe Kleine-König <u.kleine-koenig(a)pengutronix.de>
Cc: linux-stable <stable(a)vger.kernel.org> # >= 4.20
Signed-off-by: Marc Kleine-Budde <mkl(a)pengutronix.de>
diff --git a/drivers/net/can/flexcan.c b/drivers/net/can/flexcan.c
index 5f097648d12d..1c66fb2ad76b 100644
--- a/drivers/net/can/flexcan.c
+++ b/drivers/net/can/flexcan.c
@@ -1106,7 +1106,7 @@ static int flexcan_chip_start(struct net_device *dev)
}
} else {
/* clear and invalidate unused mailboxes first */
- for (i = FLEXCAN_TX_MB_RESERVED_OFF_FIFO; i <= priv->mb_count; i++) {
+ for (i = FLEXCAN_TX_MB_RESERVED_OFF_FIFO; i < priv->mb_count; i++) {
mb = flexcan_get_mb(priv, i);
priv->write(FLEXCAN_MB_CODE_RX_INACTIVE,
&mb->can_ctrl);
The patch below does not apply to the 4.20-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From a55234dabe1f72cf22f9197980751d37e38ba020 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Uwe=20Kleine-K=C3=B6nig?= <u.kleine-koenig(a)pengutronix.de>
Date: Fri, 11 Jan 2019 12:20:41 +0100
Subject: [PATCH] can: flexcan: fix NULL pointer exception during bringup
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Commit cbffaf7aa09e ("can: flexcan: Always use last mailbox for TX")
introduced a loop letting i run up to (including) ARRAY_SIZE(regs->mb)
and in the body accessed regs->mb[i] which is an out-of-bounds array
access that then resulted in an access to an reserved register area.
Later this was changed by commit 0517961ccdf1 ("can: flexcan: Add
provision for variable payload size") to iterate a bit differently but
still runs one iteration too much resulting to call
flexcan_get_mb(priv, priv->mb_count)
which results in a WARN_ON and then a NULL pointer exception. This
only affects devices compatible with "fsl,p1010-flexcan",
"fsl,imx53-flexcan", "fsl,imx35-flexcan", "fsl,imx25-flexcan",
"fsl,imx28-flexcan", so newer i.MX SoCs are not affected.
Fixes: cbffaf7aa09e ("can: flexcan: Always use last mailbox for TX")
Signed-off-by: Uwe Kleine-König <u.kleine-koenig(a)pengutronix.de>
Cc: linux-stable <stable(a)vger.kernel.org> # >= 4.20
Signed-off-by: Marc Kleine-Budde <mkl(a)pengutronix.de>
diff --git a/drivers/net/can/flexcan.c b/drivers/net/can/flexcan.c
index 5f097648d12d..1c66fb2ad76b 100644
--- a/drivers/net/can/flexcan.c
+++ b/drivers/net/can/flexcan.c
@@ -1106,7 +1106,7 @@ static int flexcan_chip_start(struct net_device *dev)
}
} else {
/* clear and invalidate unused mailboxes first */
- for (i = FLEXCAN_TX_MB_RESERVED_OFF_FIFO; i <= priv->mb_count; i++) {
+ for (i = FLEXCAN_TX_MB_RESERVED_OFF_FIFO; i < priv->mb_count; i++) {
mb = flexcan_get_mb(priv, i);
priv->write(FLEXCAN_MB_CODE_RX_INACTIVE,
&mb->can_ctrl);
From: "Gustavo A. R. Silva" <gustavo(a)embeddedor.com>
[ Upstream commit a37805098900a6e73a55b3a43b7d3bcd987bb3f4 ]
idx can be indirectly controlled by user-space, hence leading to a
potential exploitation of the Spectre variant 1 vulnerability.
This issue was detected with the help of Smatch:
drivers/gpu/drm/drm_bufs.c:1420 drm_legacy_freebufs() warn: potential
spectre issue 'dma->buflist' [r] (local cap)
Fix this by sanitizing idx before using it to index dma->buflist
Notice that given that speculation windows are large, the policy is
to kill the speculation on the first load and not worry if it can be
completed with a dependent load/store [1].
[1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2
Signed-off-by: Gustavo A. R. Silva <gustavo(a)embeddedor.com>
Signed-off-by: Daniel Vetter <daniel.vetter(a)ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20181016095549.GA23586@embedd…
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/gpu/drm/drm_bufs.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/drm_bufs.c b/drivers/gpu/drm/drm_bufs.c
index 1ee84dd802d4..0f05b8d8fefa 100644
--- a/drivers/gpu/drm/drm_bufs.c
+++ b/drivers/gpu/drm/drm_bufs.c
@@ -36,6 +36,8 @@
#include <drm/drmP.h>
#include "drm_legacy.h"
+#include <linux/nospec.h>
+
static struct drm_map_list *drm_find_matching_map(struct drm_device *dev,
struct drm_local_map *map)
{
@@ -1417,6 +1419,7 @@ int drm_legacy_freebufs(struct drm_device *dev, void *data,
idx, dma->buf_count - 1);
return -EINVAL;
}
+ idx = array_index_nospec(idx, dma->buf_count);
buf = dma->buflist[idx];
if (buf->file_priv != file_priv) {
DRM_ERROR("Process %d freeing buffer not owned\n",
--
2.19.1