Hi Andrew,
Here is another device-dax fix that requires touching some mm code. When
device-dax is operating in huge-page mode we want it to behave like
hugetlbfs and fail attempts to split vmas into unaligned ranges. It
would be messy to teach the munmap path about device-dax alignment
constraints in the same (hstate) way that hugetlbfs communicates this
constraint. Instead, these patches introduce a new ->split() vm
operation.
---
Dan Williams (2):
mm, hugetlbfs: introduce ->split() to vm_operations_struct
device-dax: implement ->split() to catch invalid munmap attempts
drivers/dax/device.c | 12 ++++++++++++
include/linux/mm.h | 1 +
mm/hugetlb.c | 8 ++++++++
mm/mmap.c | 8 +++++---
4 files changed, 26 insertions(+), 3 deletions(-)
The patch titled
Subject: mm: migrate: fix an incorrect call of prep_transhuge_page()
has been added to the -mm tree. Its filename is
mm-migrate-fix-an-incorrect-call-of-prep_transhuge_page.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/mm-migrate-fix-an-incorrect-call-o…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/mm-migrate-fix-an-incorrect-call-o…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/SubmitChecklist when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Zi Yan <zi.yan(a)cs.rutgers.edu>
Subject: mm: migrate: fix an incorrect call of prep_transhuge_page()
In https://lkml.org/lkml/2017/11/20/411, Andrea reported that during
memory hotplug/hot remove prep_transhuge_page() is called incorrectly on
non-THP pages for migration, when THP is on but THP migration is not
enabled. This leads to a bad state of target pages for migration.
This patch fixes it by only calling prep_transhuge_page() when we are
certain that the target page is THP.
Link: http://lkml.kernel.org/r/20171121021855.50525-1-zi.yan@sent.com
Fixes: 8135d8926c08 ("mm: memory_hotplug: memory hotremove supports thp migration")
Signed-off-by: Zi Yan <zi.yan(a)cs.rutgers.edu>
Reported-by: Andrea Reale <ar(a)linux.vnet.ibm.com>
Cc: Naoya Horiguchi <n-horiguchi(a)ah.jp.nec.com>
Cc: Michal Hocko <mhocko(a)kernel.org>
Cc: "Jérôme Glisse" <jglisse(a)redhat.com>
Cc: <stable(a)vger.kernel.org> [4.14]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/migrate.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff -puN include/linux/migrate.h~mm-migrate-fix-an-incorrect-call-of-prep_transhuge_page include/linux/migrate.h
--- a/include/linux/migrate.h~mm-migrate-fix-an-incorrect-call-of-prep_transhuge_page
+++ a/include/linux/migrate.h
@@ -54,7 +54,7 @@ static inline struct page *new_page_node
new_page = __alloc_pages_nodemask(gfp_mask, order,
preferred_nid, nodemask);
- if (new_page && PageTransHuge(page))
+ if (new_page && PageTransHuge(new_page))
prep_transhuge_page(new_page);
return new_page;
_
Patches currently in -mm which might be from zi.yan(a)cs.rutgers.edu are
mm-migrate-fix-an-incorrect-call-of-prep_transhuge_page.patch
From: Jeff Mahoney <jeffm(a)suse.com>
Since commit fb235dc06fa (btrfs: qgroup: Move half of the qgroup
accounting time out of commit trans) the assumption that
btrfs_add_delayed_{data,tree}_ref can only return 0 or -ENOMEM has
been false. The qgroup operations call into btrfs_search_slot
and friends and can now return the full spectrum of error codes.
Fortunately, the fix here is easy since update_ref_for_cow failing
is already handled so we just need to bail early with the error
code.
Fixes: fb235dc06fa (btrfs: qgroup: Move half of the qgroup accounting ...)
Cc: <stable(a)vger.kernel.org> # v4.11+
Signed-off-by: Jeff Mahoney <jeffm(a)suse.com>
---
fs/btrfs/ctree.c | 18 ++++++++++++------
1 file changed, 12 insertions(+), 6 deletions(-)
diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 531e0a8645b0..1e74cf826532 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -1032,14 +1032,17 @@ static noinline int update_ref_for_cow(struct btrfs_trans_handle *trans,
root->root_key.objectid == BTRFS_TREE_RELOC_OBJECTID) &&
!(flags & BTRFS_BLOCK_FLAG_FULL_BACKREF)) {
ret = btrfs_inc_ref(trans, root, buf, 1);
- BUG_ON(ret); /* -ENOMEM */
+ if (ret)
+ return ret;
if (root->root_key.objectid ==
BTRFS_TREE_RELOC_OBJECTID) {
ret = btrfs_dec_ref(trans, root, buf, 0);
- BUG_ON(ret); /* -ENOMEM */
+ if (ret)
+ return ret;
ret = btrfs_inc_ref(trans, root, cow, 1);
- BUG_ON(ret); /* -ENOMEM */
+ if (ret)
+ return ret;
}
new_flags |= BTRFS_BLOCK_FLAG_FULL_BACKREF;
} else {
@@ -1049,7 +1052,8 @@ static noinline int update_ref_for_cow(struct btrfs_trans_handle *trans,
ret = btrfs_inc_ref(trans, root, cow, 1);
else
ret = btrfs_inc_ref(trans, root, cow, 0);
- BUG_ON(ret); /* -ENOMEM */
+ if (ret)
+ return ret;
}
if (new_flags != 0) {
int level = btrfs_header_level(buf);
@@ -1068,9 +1072,11 @@ static noinline int update_ref_for_cow(struct btrfs_trans_handle *trans,
ret = btrfs_inc_ref(trans, root, cow, 1);
else
ret = btrfs_inc_ref(trans, root, cow, 0);
- BUG_ON(ret); /* -ENOMEM */
+ if (ret)
+ return ret;
ret = btrfs_dec_ref(trans, root, buf, 1);
- BUG_ON(ret); /* -ENOMEM */
+ if (ret)
+ return ret;
}
clean_tree_block(fs_info, buf);
*last_ref = 1;
--
2.14.2
Changes since v2 [1]:
* Switch from the "#define __HAVE_ARCH_PUD_WRITE" to "#define
pud_write". This incidentally fixes a powerpc compile error.
(Stephen)
* Add a cleanup patch to align pmd_write to the pud_write definition
scheme.
---
Andrew,
Here is another attempt at the pud_write() fix [2], and some follow-on
patches to use the '_access_permitted' helpers in fault and
get_user_pages() paths where we are checking if the thread has access to
write. I explicitly omit conversions for places where the kernel is
checking the _PAGE_RW flag for kernel purposes, not for userspace
access.
Beyond fixing the crash, this series also fixes get_user_pages() and
fault paths to honor protection keys in the same manner as
get_user_pages_fast(). Only the crash fix is tagged for -stable as the
protection key check is done just for consistency reasons since
userspace can change protection keys at will.
These have received a build success notification from the 0day robot.
[1]: https://lists.01.org/pipermail/linux-nvdimm/2017-November/013254.html
[2]: https://lists.01.org/pipermail/linux-nvdimm/2017-November/013237.html
---
Dan Williams (5):
mm: fix device-dax pud write-faults triggered by get_user_pages()
mm: switch to 'define pmd_write' instead of __HAVE_ARCH_PMD_WRITE
mm: replace pud_write with pud_access_permitted in fault + gup paths
mm: replace pmd_write with pmd_access_permitted in fault + gup paths
mm: replace pte_write with pte_access_permitted in fault + gup paths
arch/arm/include/asm/pgtable-3level.h | 1 -
arch/arm64/include/asm/pgtable.h | 1 -
arch/mips/include/asm/pgtable.h | 2 +-
arch/powerpc/include/asm/book3s/64/pgtable.h | 1 -
arch/s390/include/asm/pgtable.h | 8 +++++++-
arch/sparc/include/asm/pgtable_64.h | 2 +-
arch/sparc/mm/gup.c | 4 ++--
arch/tile/include/asm/pgtable.h | 1 -
arch/x86/include/asm/pgtable.h | 8 +++++++-
fs/dax.c | 3 ++-
include/asm-generic/pgtable.h | 12 ++++++++++--
include/linux/hugetlb.h | 8 --------
mm/gup.c | 2 +-
mm/hmm.c | 8 ++++----
mm/huge_memory.c | 6 +++---
mm/memory.c | 8 ++++----
16 files changed, 42 insertions(+), 33 deletions(-)
On 11/21/2017 11:48 AM, Leon Romanovsky wrote:
> On Tue, Nov 21, 2017 at 09:36:48AM -0700, Jason Gunthorpe wrote:
>> On Tue, Nov 21, 2017 at 06:34:54PM +0200, Leon Romanovsky wrote:
>>> On Tue, Nov 21, 2017 at 09:04:42AM -0700, Jason Gunthorpe wrote:
>>>> On Tue, Nov 21, 2017 at 09:37:27AM -0600, Daniel Jurgens wrote:
>>>>
>>>>> The only warning that would make sense is if the mixed ports aren't
>>>>> all IB or RoCE. As you note, CX-3 can mix those two, we don't want to
>>>>> see warnings about that.
>>>>
>>>> I would really like to see cx3 be changed to not do that, then we
>>>> could finalize this issue upstream: All device ports must be the same
>>>> protocol.
>>>
>>> I don't see the point of such artificial limitation, the users who
>>> brought CX-3 have option to work in mixed mode and IMHO it is not right
>>> to deprecate such ability just because it is hard for us to code for it.
>>
>> I don't really think it is really too user visible.. Only the device
>> and port number change, but only if running in mixed mode.
>
> Ahh, correct me if I'm wrong, you are proposing to split mlx4_ib devices
> to two devices once it is configured in mixed mode, so everyone will
> have one port only. Did I understand you correctly?
>
Which would require splitting resources that are shared now? other splitting issue(s)?
>>
>> It is not just 'hard for us' it is impossible to reconcile the
>> differences between ports when enforcing device level things.
>>
>> This keeps coming up again and again..
>>
>> Jason
On 11/21/2017 11:34 AM, Leon Romanovsky wrote:
> On Tue, Nov 21, 2017 at 09:04:42AM -0700, Jason Gunthorpe wrote:
>> On Tue, Nov 21, 2017 at 09:37:27AM -0600, Daniel Jurgens wrote:
>>
>>> The only warning that would make sense is if the mixed ports aren't
>>> all IB or RoCE. As you note, CX-3 can mix those two, we don't want to
>>> see warnings about that.
>>
>> I would really like to see cx3 be changed to not do that, then we
>> could finalize this issue upstream: All device ports must be the same
>> protocol.
>
> I don't see the point of such artificial limitation, the users who
> brought CX-3 have option to work in mixed mode and IMHO it is not right
> to deprecate such ability just because it is hard for us to code for it.
>
We use cx-3 (& cx-4 & cx-5) in mixed mode all over our RDMA cluster.
Loosing that feature is not an option.
> Thanks
>
>>
>> Jason
On Tue, Nov 21, 2017 at 06:48:02PM +0200, Leon Romanovsky wrote:
> On Tue, Nov 21, 2017 at 09:36:48AM -0700, Jason Gunthorpe wrote:
> Ahh, correct me if I'm wrong, you are proposing to split mlx4_ib devices
> to two devices once it is configured in mixed mode, so everyone will
> have one port only. Did I understand you correctly?
Right, but only for mixed mode. dual port roce or dual port ib is not
altered.
Jason
From: Adam Wallis <awallis(a)codeaurora.org>
commit a9df21e34b422f79d9a9fa5c3eff8c2a53491be6 upstream.
This patch was backported and only needed a line adjustment.
Commit adfa543e7314 ("dmatest: don't use set_freezable_with_signal()")
introduced a bug (that is in fact documented by the patch commit text)
that leaves behind a dangling pointer. Since the done_wait structure is
allocated on the stack, future invocations to the DMATEST can produce
undesirable results (e.g., corrupted spinlocks). Ideally, this would be
cleaned up in the thread handler, but at the very least, the kernel
is left in a very precarious scenario that can lead to some long debug
sessions when the crash comes later.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=197605
Signed-off-by: Adam Wallis <awallis(a)codeaurora.org>
Signed-off-by: Vinod Koul <vinod.koul(a)intel.com>
---
drivers/dma/dmatest.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/dma/dmatest.c b/drivers/dma/dmatest.c
index cf76fc6..fbb7551 100644
--- a/drivers/dma/dmatest.c
+++ b/drivers/dma/dmatest.c
@@ -666,6 +666,7 @@ static int dmatest_func(void *data)
* free it this time?" dancing. For now, just
* leave it dangling.
*/
+ WARN(1, "dmatest: Kernel stack may be corrupted!!\n");
dmaengine_unmap_put(um);
result("test timed out", total_tests, src_off, dst_off,
len, 0);
--
Qualcomm Datacenter Technologies as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.
This is a note to let you know that I've just added the patch titled
serial: omap: Fix EFR write on RTS deassertion
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
serial-omap-fix-efr-write-on-rts-deassertion.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 2a71de2f7366fb1aec632116d0549ec56d6a3940 Mon Sep 17 00:00:00 2001
From: Lukas Wunner <lukas(a)wunner.de>
Date: Sat, 21 Oct 2017 10:50:18 +0200
Subject: serial: omap: Fix EFR write on RTS deassertion
From: Lukas Wunner <lukas(a)wunner.de>
commit 2a71de2f7366fb1aec632116d0549ec56d6a3940 upstream.
Commit 348f9bb31c56 ("serial: omap: Fix RTS handling") sought to enable
auto RTS upon manual RTS assertion and disable it on deassertion.
However it seems the latter was done incorrectly, it clears all bits in
the Extended Features Register *except* auto RTS.
Fixes: 348f9bb31c56 ("serial: omap: Fix RTS handling")
Cc: Peter Hurley <peter(a)hurleysoftware.com>
Signed-off-by: Lukas Wunner <lukas(a)wunner.de>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/tty/serial/omap-serial.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/tty/serial/omap-serial.c
+++ b/drivers/tty/serial/omap-serial.c
@@ -693,7 +693,7 @@ static void serial_omap_set_mctrl(struct
if ((mctrl & TIOCM_RTS) && (port->status & UPSTAT_AUTORTS))
up->efr |= UART_EFR_RTS;
else
- up->efr &= UART_EFR_RTS;
+ up->efr &= ~UART_EFR_RTS;
serial_out(up, UART_EFR, up->efr);
serial_out(up, UART_LCR, lcr);
Patches currently in stable-queue which might be from lukas(a)wunner.de are
queue-4.9/serial-omap-fix-efr-write-on-rts-deassertion.patch