On Wed 22-11-17 04:18:35, Zi Yan wrote:
> On 22 Nov 2017, at 3:54, Michal Hocko wrote:
[...]
> > I would keep the two checks consistent. But that leads to a more
> > interesting question. new_page_nodemask does
> >
> > if (thp_migration_supported() && PageTransHuge(page)) {
> > order = HPAGE_PMD_ORDER;
> > gfp_mask |= GFP_TRANSHUGE;
> > }
> >
> > How come it is safe to allocate an order-0 page if
> > !thp_migration_supported() when we are about to migrate THP? This
> > doesn't make any sense to me. Are we working around this somewhere else?
> > Why shouldn't we simply return NULL here?
>
> If !thp_migration_supported(), we will first split a THP and migrate
> its head page. This process is done in unmap_and_move() after
> get_new_page() (the function pointer to this new_page_nodemask()) is
> called. The situation can be PageTransHuge(page) is true here, but the
> page is split in unmap_and_move(), so we want to return a order-0 page
> here.
This deserves a big fat comment in the code because this is not clear
from the code!
> I think the confusion comes from that there is no guarantee of THP
> allocation when we are doing THP migration. If we can allocate a THP
> during THP migration, we are good. Otherwise, we want to fallback to
> the old way, splitting the original THP and migrating the head page,
> to preserve the original code behavior.
I understand that but that should be done explicitly rather than relying
on two functions doing the right thing because this is just too fragile.
Moreover I am not really sure this is really working properly. Just look
at the split_huge_page. It moves all the tail pages to the LRU list
while migrate_pages has a list of pages to migrate. So we will migrate
the head page and all the rest will get back to the LRU list. What
guarantees that they will get migrated as well.
This all looks like a mess!
--
Michal Hocko
SUSE Labs
On Tue 21 Nov 2017, 17:35, Zi Yan wrote:
> On 21 Nov 2017, at 17:12, Andrew Morton wrote:
>
> > On Mon, 20 Nov 2017 21:18:55 -0500 Zi Yan <zi.yan(a)sent.com> wrote:
> >
> >> This patch fixes it by only calling prep_transhuge_page() when we are
> >> certain that the target page is THP.
> >
> > What are the user-visible effects of the bug?
>
> By inspecting the code, if called on a non-THP, prep_transhuge_page() will
> 1) change the value of the mapping of (page + 2), since it is used for THP deferred list;
> 2) change the lru value of (page + 1), since it is used for THP’s dtor.
>
> Both can lead to data corruption of these two pages.
Pragmatically and from the point of view of the memory_hotplug subsys,
the effect is a kernel crash when pages are being migrated during a memory
hot remove offline and migration target pages are found in a bad state.
Best,
Andrea
This is a note to let you know that I've just added the patch titled
coda: fix 'kernel memory exposure attempt' in fsync
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
coda-fix-kernel-memory-exposure-attempt-in-fsync.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From d337b66a4c52c7b04eec661d86c2ef6e168965a2 Mon Sep 17 00:00:00 2001
From: Jan Harkes <jaharkes(a)cs.cmu.edu>
Date: Wed, 27 Sep 2017 15:52:12 -0400
Subject: coda: fix 'kernel memory exposure attempt' in fsync
From: Jan Harkes <jaharkes(a)cs.cmu.edu>
commit d337b66a4c52c7b04eec661d86c2ef6e168965a2 upstream.
When an application called fsync on a file in Coda a small request with
just the file identifier was allocated, but the declared length was set
to the size of union of all possible upcall requests.
This bug has been around for a very long time and is now caught by the
extra checking in usercopy that was introduced in Linux-4.8.
The exposure happens when the Coda cache manager process reads the fsync
upcall request at which point it is killed. As a result there is nobody
servicing any further upcalls, trapping any processes that try to access
the mounted Coda filesystem.
Signed-off-by: Jan Harkes <jaharkes(a)cs.cmu.edu>
Signed-off-by: Al Viro <viro(a)zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
fs/coda/upcall.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
--- a/fs/coda/upcall.c
+++ b/fs/coda/upcall.c
@@ -446,8 +446,7 @@ int venus_fsync(struct super_block *sb,
UPARG(CODA_FSYNC);
inp->coda_fsync.VFid = *fid;
- error = coda_upcall(coda_vcp(sb), sizeof(union inputArgs),
- &outsize, inp);
+ error = coda_upcall(coda_vcp(sb), insize, &outsize, inp);
CODA_FREE(inp, insize);
return error;
Patches currently in stable-queue which might be from jaharkes(a)cs.cmu.edu are
queue-4.9/coda-fix-kernel-memory-exposure-attempt-in-fsync.patch
This is a note to let you know that I've just added the patch titled
coda: fix 'kernel memory exposure attempt' in fsync
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
coda-fix-kernel-memory-exposure-attempt-in-fsync.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From d337b66a4c52c7b04eec661d86c2ef6e168965a2 Mon Sep 17 00:00:00 2001
From: Jan Harkes <jaharkes(a)cs.cmu.edu>
Date: Wed, 27 Sep 2017 15:52:12 -0400
Subject: coda: fix 'kernel memory exposure attempt' in fsync
From: Jan Harkes <jaharkes(a)cs.cmu.edu>
commit d337b66a4c52c7b04eec661d86c2ef6e168965a2 upstream.
When an application called fsync on a file in Coda a small request with
just the file identifier was allocated, but the declared length was set
to the size of union of all possible upcall requests.
This bug has been around for a very long time and is now caught by the
extra checking in usercopy that was introduced in Linux-4.8.
The exposure happens when the Coda cache manager process reads the fsync
upcall request at which point it is killed. As a result there is nobody
servicing any further upcalls, trapping any processes that try to access
the mounted Coda filesystem.
Signed-off-by: Jan Harkes <jaharkes(a)cs.cmu.edu>
Signed-off-by: Al Viro <viro(a)zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
fs/coda/upcall.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
--- a/fs/coda/upcall.c
+++ b/fs/coda/upcall.c
@@ -446,8 +446,7 @@ int venus_fsync(struct super_block *sb,
UPARG(CODA_FSYNC);
inp->coda_fsync.VFid = *fid;
- error = coda_upcall(coda_vcp(sb), sizeof(union inputArgs),
- &outsize, inp);
+ error = coda_upcall(coda_vcp(sb), insize, &outsize, inp);
CODA_FREE(inp, insize);
return error;
Patches currently in stable-queue which might be from jaharkes(a)cs.cmu.edu are
queue-4.4/coda-fix-kernel-memory-exposure-attempt-in-fsync.patch
This is a note to let you know that I've just added the patch titled
coda: fix 'kernel memory exposure attempt' in fsync
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
coda-fix-kernel-memory-exposure-attempt-in-fsync.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From d337b66a4c52c7b04eec661d86c2ef6e168965a2 Mon Sep 17 00:00:00 2001
From: Jan Harkes <jaharkes(a)cs.cmu.edu>
Date: Wed, 27 Sep 2017 15:52:12 -0400
Subject: coda: fix 'kernel memory exposure attempt' in fsync
From: Jan Harkes <jaharkes(a)cs.cmu.edu>
commit d337b66a4c52c7b04eec661d86c2ef6e168965a2 upstream.
When an application called fsync on a file in Coda a small request with
just the file identifier was allocated, but the declared length was set
to the size of union of all possible upcall requests.
This bug has been around for a very long time and is now caught by the
extra checking in usercopy that was introduced in Linux-4.8.
The exposure happens when the Coda cache manager process reads the fsync
upcall request at which point it is killed. As a result there is nobody
servicing any further upcalls, trapping any processes that try to access
the mounted Coda filesystem.
Signed-off-by: Jan Harkes <jaharkes(a)cs.cmu.edu>
Signed-off-by: Al Viro <viro(a)zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
fs/coda/upcall.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
--- a/fs/coda/upcall.c
+++ b/fs/coda/upcall.c
@@ -447,8 +447,7 @@ int venus_fsync(struct super_block *sb,
UPARG(CODA_FSYNC);
inp->coda_fsync.VFid = *fid;
- error = coda_upcall(coda_vcp(sb), sizeof(union inputArgs),
- &outsize, inp);
+ error = coda_upcall(coda_vcp(sb), insize, &outsize, inp);
CODA_FREE(inp, insize);
return error;
Patches currently in stable-queue which might be from jaharkes(a)cs.cmu.edu are
queue-4.14/coda-fix-kernel-memory-exposure-attempt-in-fsync.patch
This is a note to let you know that I've just added the patch titled
coda: fix 'kernel memory exposure attempt' in fsync
to the 4.13-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
coda-fix-kernel-memory-exposure-attempt-in-fsync.patch
and it can be found in the queue-4.13 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From d337b66a4c52c7b04eec661d86c2ef6e168965a2 Mon Sep 17 00:00:00 2001
From: Jan Harkes <jaharkes(a)cs.cmu.edu>
Date: Wed, 27 Sep 2017 15:52:12 -0400
Subject: coda: fix 'kernel memory exposure attempt' in fsync
From: Jan Harkes <jaharkes(a)cs.cmu.edu>
commit d337b66a4c52c7b04eec661d86c2ef6e168965a2 upstream.
When an application called fsync on a file in Coda a small request with
just the file identifier was allocated, but the declared length was set
to the size of union of all possible upcall requests.
This bug has been around for a very long time and is now caught by the
extra checking in usercopy that was introduced in Linux-4.8.
The exposure happens when the Coda cache manager process reads the fsync
upcall request at which point it is killed. As a result there is nobody
servicing any further upcalls, trapping any processes that try to access
the mounted Coda filesystem.
Signed-off-by: Jan Harkes <jaharkes(a)cs.cmu.edu>
Signed-off-by: Al Viro <viro(a)zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
fs/coda/upcall.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
--- a/fs/coda/upcall.c
+++ b/fs/coda/upcall.c
@@ -446,8 +446,7 @@ int venus_fsync(struct super_block *sb,
UPARG(CODA_FSYNC);
inp->coda_fsync.VFid = *fid;
- error = coda_upcall(coda_vcp(sb), sizeof(union inputArgs),
- &outsize, inp);
+ error = coda_upcall(coda_vcp(sb), insize, &outsize, inp);
CODA_FREE(inp, insize);
return error;
Patches currently in stable-queue which might be from jaharkes(a)cs.cmu.edu are
queue-4.13/coda-fix-kernel-memory-exposure-attempt-in-fsync.patch
This is a note to let you know that I've just added the patch titled
coda: fix 'kernel memory exposure attempt' in fsync
to the 3.18-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
coda-fix-kernel-memory-exposure-attempt-in-fsync.patch
and it can be found in the queue-3.18 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From d337b66a4c52c7b04eec661d86c2ef6e168965a2 Mon Sep 17 00:00:00 2001
From: Jan Harkes <jaharkes(a)cs.cmu.edu>
Date: Wed, 27 Sep 2017 15:52:12 -0400
Subject: coda: fix 'kernel memory exposure attempt' in fsync
From: Jan Harkes <jaharkes(a)cs.cmu.edu>
commit d337b66a4c52c7b04eec661d86c2ef6e168965a2 upstream.
When an application called fsync on a file in Coda a small request with
just the file identifier was allocated, but the declared length was set
to the size of union of all possible upcall requests.
This bug has been around for a very long time and is now caught by the
extra checking in usercopy that was introduced in Linux-4.8.
The exposure happens when the Coda cache manager process reads the fsync
upcall request at which point it is killed. As a result there is nobody
servicing any further upcalls, trapping any processes that try to access
the mounted Coda filesystem.
Signed-off-by: Jan Harkes <jaharkes(a)cs.cmu.edu>
Signed-off-by: Al Viro <viro(a)zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
fs/coda/upcall.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
--- a/fs/coda/upcall.c
+++ b/fs/coda/upcall.c
@@ -446,8 +446,7 @@ int venus_fsync(struct super_block *sb,
UPARG(CODA_FSYNC);
inp->coda_fsync.VFid = *fid;
- error = coda_upcall(coda_vcp(sb), sizeof(union inputArgs),
- &outsize, inp);
+ error = coda_upcall(coda_vcp(sb), insize, &outsize, inp);
CODA_FREE(inp, insize);
return error;
Patches currently in stable-queue which might be from jaharkes(a)cs.cmu.edu are
queue-3.18/coda-fix-kernel-memory-exposure-attempt-in-fsync.patch
The patch below does not apply to the 4.13-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From e9a6effa500526e2a19d5ad042cb758b55b1ef93 Mon Sep 17 00:00:00 2001
From: Huang Ying <huang.ying.caritas(a)gmail.com>
Date: Wed, 15 Nov 2017 17:33:15 -0800
Subject: [PATCH] mm, swap: fix false error message in __swp_swapcount()
When a page fault occurs for a swap entry, the physical swap readahead
(not the VMA base swap readahead) may readahead several swap entries
after the fault swap entry. The readahead algorithm calculates some of
the swap entries to readahead via increasing the offset of the fault
swap entry without checking whether they are beyond the end of the swap
device and it relys on the __swp_swapcount() and swapcache_prepare() to
check it. Although __swp_swapcount() checks for the swap entry passed
in, it will complain with the error message as follow for the expected
invalid swap entry. This may make the end users confused.
swap_info_get: Bad swap offset entry 0200f8a7
To fix the false error message, the swap entry checking is added in
swapin_readahead() to avoid to pass the out-of-bound swap entries and
the swap entry reserved for the swap header to __swp_swapcount() and
swapcache_prepare().
Link: http://lkml.kernel.org/r/20171102054225.22897-1-ying.huang@intel.com
Fixes: e8c26ab60598 ("mm/swap: skip readahead for unreferenced swap slots")
Signed-off-by: "Huang, Ying" <ying.huang(a)intel.com>
Reported-by: Christian Kujau <lists(a)nerdbynature.de>
Acked-by: Minchan Kim <minchan(a)kernel.org>
Suggested-by: Minchan Kim <minchan(a)kernel.org>
Cc: Tim Chen <tim.c.chen(a)linux.intel.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: <stable(a)vger.kernel.org> [4.11+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
diff --git a/mm/swap_state.c b/mm/swap_state.c
index 326439428daf..f2face8b889e 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -559,6 +559,7 @@ struct page *swapin_readahead(swp_entry_t entry, gfp_t gfp_mask,
unsigned long offset = entry_offset;
unsigned long start_offset, end_offset;
unsigned long mask;
+ struct swap_info_struct *si = swp_swap_info(entry);
struct blk_plug plug;
bool do_poll = true, page_allocated;
@@ -572,6 +573,8 @@ struct page *swapin_readahead(swp_entry_t entry, gfp_t gfp_mask,
end_offset = offset | mask;
if (!start_offset) /* First page is swap header. */
start_offset++;
+ if (end_offset >= si->max)
+ end_offset = si->max - 1;
blk_start_plug(&plug);
for (offset = start_offset; offset <= end_offset ; offset++) {
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From e9a6effa500526e2a19d5ad042cb758b55b1ef93 Mon Sep 17 00:00:00 2001
From: Huang Ying <huang.ying.caritas(a)gmail.com>
Date: Wed, 15 Nov 2017 17:33:15 -0800
Subject: [PATCH] mm, swap: fix false error message in __swp_swapcount()
When a page fault occurs for a swap entry, the physical swap readahead
(not the VMA base swap readahead) may readahead several swap entries
after the fault swap entry. The readahead algorithm calculates some of
the swap entries to readahead via increasing the offset of the fault
swap entry without checking whether they are beyond the end of the swap
device and it relys on the __swp_swapcount() and swapcache_prepare() to
check it. Although __swp_swapcount() checks for the swap entry passed
in, it will complain with the error message as follow for the expected
invalid swap entry. This may make the end users confused.
swap_info_get: Bad swap offset entry 0200f8a7
To fix the false error message, the swap entry checking is added in
swapin_readahead() to avoid to pass the out-of-bound swap entries and
the swap entry reserved for the swap header to __swp_swapcount() and
swapcache_prepare().
Link: http://lkml.kernel.org/r/20171102054225.22897-1-ying.huang@intel.com
Fixes: e8c26ab60598 ("mm/swap: skip readahead for unreferenced swap slots")
Signed-off-by: "Huang, Ying" <ying.huang(a)intel.com>
Reported-by: Christian Kujau <lists(a)nerdbynature.de>
Acked-by: Minchan Kim <minchan(a)kernel.org>
Suggested-by: Minchan Kim <minchan(a)kernel.org>
Cc: Tim Chen <tim.c.chen(a)linux.intel.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: <stable(a)vger.kernel.org> [4.11+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
diff --git a/mm/swap_state.c b/mm/swap_state.c
index 326439428daf..f2face8b889e 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -559,6 +559,7 @@ struct page *swapin_readahead(swp_entry_t entry, gfp_t gfp_mask,
unsigned long offset = entry_offset;
unsigned long start_offset, end_offset;
unsigned long mask;
+ struct swap_info_struct *si = swp_swap_info(entry);
struct blk_plug plug;
bool do_poll = true, page_allocated;
@@ -572,6 +573,8 @@ struct page *swapin_readahead(swp_entry_t entry, gfp_t gfp_mask,
end_offset = offset | mask;
if (!start_offset) /* First page is swap header. */
start_offset++;
+ if (end_offset >= si->max)
+ end_offset = si->max - 1;
blk_start_plug(&plug);
for (offset = start_offset; offset <= end_offset ; offset++) {
This is a note to let you know that I've just added the patch titled
mm, swap: fix false error message in __swp_swapcount()
to the 4.13-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
mm-swap-fix-false-error-message-in-__swp_swapcount.patch
and it can be found in the queue-4.13 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From e9a6effa500526e2a19d5ad042cb758b55b1ef93 Mon Sep 17 00:00:00 2001
From: Huang Ying <huang.ying.caritas(a)gmail.com>
Date: Wed, 15 Nov 2017 17:33:15 -0800
Subject: mm, swap: fix false error message in __swp_swapcount()
From: Huang Ying <huang.ying.caritas(a)gmail.com>
commit e9a6effa500526e2a19d5ad042cb758b55b1ef93 upstream.
When a page fault occurs for a swap entry, the physical swap readahead
(not the VMA base swap readahead) may readahead several swap entries
after the fault swap entry. The readahead algorithm calculates some of
the swap entries to readahead via increasing the offset of the fault
swap entry without checking whether they are beyond the end of the swap
device and it relys on the __swp_swapcount() and swapcache_prepare() to
check it. Although __swp_swapcount() checks for the swap entry passed
in, it will complain with the error message as follow for the expected
invalid swap entry. This may make the end users confused.
swap_info_get: Bad swap offset entry 0200f8a7
To fix the false error message, the swap entry checking is added in
swapin_readahead() to avoid to pass the out-of-bound swap entries and
the swap entry reserved for the swap header to __swp_swapcount() and
swapcache_prepare().
Link: http://lkml.kernel.org/r/20171102054225.22897-1-ying.huang@intel.com
Fixes: e8c26ab60598 ("mm/swap: skip readahead for unreferenced swap slots")
Signed-off-by: "Huang, Ying" <ying.huang(a)intel.com>
Reported-by: Christian Kujau <lists(a)nerdbynature.de>
Acked-by: Minchan Kim <minchan(a)kernel.org>
Suggested-by: Minchan Kim <minchan(a)kernel.org>
Cc: Tim Chen <tim.c.chen(a)linux.intel.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Hugh Dickins <hughd(a)google.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
mm/swap_state.c | 3 +++
1 file changed, 3 insertions(+)
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -506,6 +506,7 @@ struct page *swapin_readahead(swp_entry_
unsigned long offset = entry_offset;
unsigned long start_offset, end_offset;
unsigned long mask;
+ struct swap_info_struct *si = swp_swap_info(entry);
struct blk_plug plug;
bool do_poll = true;
@@ -519,6 +520,8 @@ struct page *swapin_readahead(swp_entry_
end_offset = offset | mask;
if (!start_offset) /* First page is swap header. */
start_offset++;
+ if (end_offset >= si->max)
+ end_offset = si->max - 1;
blk_start_plug(&plug);
for (offset = start_offset; offset <= end_offset ; offset++) {
Patches currently in stable-queue which might be from huang.ying.caritas(a)gmail.com are
queue-4.13/mm-swap-fix-false-error-message-in-__swp_swapcount.patch