May 2018 - Linux-stable-mirror

+ radix-tree-fix-multi-order-iteration-race.patch added to -mm tree

by akpm＠linux-foundation.org

The patch titled Subject: radix tree: fix multi-order iteration race has been added to the -mm tree. Its filename is radix-tree-fix-multi-order-iteration-race.patch This patch should soon appear at http://ozlabs.org/~akpm/mmots/broken-out/radix-tree-fix-multi-order-iterati… and later at http://ozlabs.org/~akpm/mmotm/broken-out/radix-tree-fix-multi-order-iterati… Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next and is updated there every 3-4 working days ------------------------------------------------------ From: Ross Zwisler <ross.zwisler(a)linux.intel.com> Subject: radix tree: fix multi-order iteration race Fix a race in the multi-order iteration code which causes the kernel to hit a GP fault. This was first seen with a production v4.15 based kernel (4.15.6-300.fc27.x86_64) utilizing a DAX workload which used order 9 PMD DAX entries. The race has to do with how we tear down multi-order sibling entries when we are removing an item from the tree. Remember for example that an order 2 entry looks like this: struct radix_tree_node.slots[] = [entry][sibling][sibling][sibling] where 'entry' is in some slot in the struct radix_tree_node, and the three slots following 'entry' contain sibling pointers which point back to 'entry.' When we delete 'entry' from the tree, we call : radix_tree_delete() radix_tree_delete_item() __radix_tree_delete() replace_slot() replace_slot() first removes the siblings in order from the first to the last, then at then replaces 'entry' with NULL. This means that for a brief period of time we end up with one or more of the siblings removed, so: struct radix_tree_node.slots[] = [entry][NULL][sibling][sibling] This causes an issue if you have a reader iterating over the slots in the tree via radix_tree_for_each_slot() while only under rcu_read_lock()/rcu_read_unlock() protection. This is a common case in mm/filemap.c. The issue is that when __radix_tree_next_slot() => skip_siblings() tries to skip over the sibling entries in the slots, it currently does so with an exact match on the slot directly preceding our current slot. Normally this works: V preceding slot struct radix_tree_node.slots[] = [entry][sibling][sibling][sibling] ^ current slot This lets you find the first sibling, and you skip them all in order. But in the case where one of the siblings is NULL, that slot is skipped and then our sibling detection is interrupted: V preceding slot struct radix_tree_node.slots[] = [entry][NULL][sibling][sibling] ^ current slot This means that the sibling pointers aren't recognized since they point all the way back to 'entry', so we think that they are normal internal radix tree pointers. This causes us to think we need to walk down to a struct radix_tree_node starting at the address of 'entry'. In a real running kernel this will crash the thread with a GP fault when you try and dereference the slots in your broken node starting at 'entry'. We fix this race by fixing the way that skip_siblings() detects sibling nodes. Instead of testing against the preceding slot we instead look for siblings via is_sibling_entry() which compares against the position of the struct radix_tree_node.slots[] array. This ensures that sibling entries are properly identified, even if they are no longer contiguous with the 'entry' they point to. Link: http://lkml.kernel.org/r/20180503192430.7582-6-ross.zwisler@linux.intel.com Fixes: 148deab223b2 ("radix-tree: improve multiorder iterators") Signed-off-by: Ross Zwisler <ross.zwisler(a)linux.intel.com> Reported-by: CR, Sapthagirish <sapthagirish.cr(a)intel.com> Reviewed-by: Jan Kara <jack(a)suse.cz> Cc: Matthew Wilcox <willy(a)infradead.org> Cc: Christoph Hellwig <hch(a)lst.de> Cc: Dan Williams <dan.j.williams(a)intel.com> Cc: Dave Chinner <david(a)fromorbit.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- lib/radix-tree.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff -puN lib/radix-tree.c~radix-tree-fix-multi-order-iteration-race lib/radix-tree.c --- a/lib/radix-tree.c~radix-tree-fix-multi-order-iteration-race +++ a/lib/radix-tree.c @@ -1612,11 +1612,9 @@ static void set_iter_tags(struct radix_t static void __rcu **skip_siblings(struct radix_tree_node **nodep, void __rcu **slot, struct radix_tree_iter *iter) { - void *sib = node_to_entry(slot - 1); - while (iter->index < iter->next_index) { *nodep = rcu_dereference_raw(*slot); - if (*nodep && *nodep != sib) + if (*nodep && !is_sibling_entry(iter->node, *nodep)) return slot; slot++; iter->index = __radix_tree_iter_add(iter, 1); @@ -1631,7 +1629,7 @@ void __rcu **__radix_tree_next_slot(void struct radix_tree_iter *iter, unsigned flags) { unsigned tag = flags & RADIX_TREE_ITER_TAG_MASK; - struct radix_tree_node *node = rcu_dereference_raw(*slot); + struct radix_tree_node *node; slot = skip_siblings(&node, slot, iter); _ Patches currently in -mm which might be from ross.zwisler(a)linux.intel.com are radix-tree-test-suite-fix-mapshift-build-target.patch radix-tree-test-suite-fix-compilation-issue.patch radix-tree-test-suite-add-item_delete_rcu.patch radix-tree-test-suite-multi-order-iteration-race.patch radix-tree-fix-multi-order-iteration-race.patch

7 years, 2 months

3
2
0 0

Backport commit b7363e67 to stable 4.9.y

by Raju Rangoju

Hi Greg, Could you please backport the below commit to 4.9 stable tree. With a CPU bounded workqueue we see much lower cpu utilization and the same IOPs performance. commit b7363e67b23e04c23c2a99437feefac7292a88bc Author: Sagi Grimberg <sagi(a)grimberg.me> Date: Wed Mar 8 22:03:17 2017 +0200 IB/device: Convert ib-comp-wq to be CPU-bound Thanks, Raju

7 years, 2 months

2
2
0 0

[PATCH 2/2] can: hi311x: Work around TX complete interrupt erratum

by Marc Kleine-Budde

From: Lukas Wunner <lukas(a)wunner.de> When sending packets as fast as possible using "cangen -g 0 -i -x", the HI-3110 occasionally latches the interrupt pin high on completion of a packet, but doesn't set the TXCPLT bit in the INTF register. The INTF register contains 0x00 as if no interrupt has occurred. Even waiting for a few milliseconds after the interrupt doesn't help. Work around this apparent erratum by instead checking the TXMTY bit in the STATF register ("TX FIFO empty"). We know that we've queued up a packet for transmission if priv->tx_len is nonzero. If the TX FIFO is empty, transmission of that packet must have completed. Note that this is congruent with our handling of received packets, which likewise gleans from the STATF register whether a packet is waiting in the RX FIFO, instead of looking at the INTF register. Cc: Mathias Duckeck <m.duckeck(a)kunbus.de> Cc: Akshay Bhat <akshay.bhat(a)timesys.com> Cc: Casey Fitzpatrick <casey.fitzpatrick(a)timesys.com> Cc: stable(a)vger.kernel.org # v4.12+ Signed-off-by: Lukas Wunner <lukas(a)wunner.de> Acked-by: Akshay Bhat <akshay.bhat(a)timesys.com> Signed-off-by: Marc Kleine-Budde <mkl(a)pengutronix.de> --- drivers/net/can/spi/hi311x.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/drivers/net/can/spi/hi311x.c b/drivers/net/can/spi/hi311x.c index c2cf254e4e95..53e320c92a8b 100644 --- a/drivers/net/can/spi/hi311x.c +++ b/drivers/net/can/spi/hi311x.c @@ -91,6 +91,7 @@ #define HI3110_STAT_BUSOFF BIT(2) #define HI3110_STAT_ERRP BIT(3) #define HI3110_STAT_ERRW BIT(4) +#define HI3110_STAT_TXMTY BIT(7) #define HI3110_BTR0_SJW_SHIFT 6 #define HI3110_BTR0_BRP_SHIFT 0 @@ -737,10 +738,7 @@ static irqreturn_t hi3110_can_ist(int irq, void *dev_id) } } - if (intf == 0) - break; - - if (intf & HI3110_INT_TXCPLT) { + if (priv->tx_len && statf & HI3110_STAT_TXMTY) { net->stats.tx_packets++; net->stats.tx_bytes += priv->tx_len - 1; can_led_event(net, CAN_LED_EVENT_TX); @@ -750,6 +748,9 @@ static irqreturn_t hi3110_can_ist(int irq, void *dev_id) } netif_wake_queue(net); } + + if (intf == 0) + break; } mutex_unlock(&priv->hi3110_lock); return IRQ_HANDLED; -- 2.17.0

7 years, 2 months

1
0
0 0

[PATCH 1/2] can: hi311x: Acquire SPI lock on ->do_get_berr_counter

by Marc Kleine-Budde

From: Lukas Wunner <lukas(a)wunner.de> hi3110_get_berr_counter() may run concurrently to the rest of the driver but neglects to acquire the lock protecting access to the SPI device. As a result, it and the rest of the driver may clobber each other's tx and rx buffers. We became aware of this issue because transmission of packets with "cangen -g 0 -i -x" frequently hung. It turns out that agetty executes ->do_get_berr_counter every few seconds via the following call stack: CPU: 2 PID: 1605 Comm: agetty [<7f3f7500>] (hi3110_get_berr_counter [hi311x]) [<7f130204>] (can_fill_info [can_dev]) [<80693bc0>] (rtnl_fill_ifinfo) [<806949ec>] (rtnl_dump_ifinfo) [<806b4834>] (netlink_dump) [<806b4bc8>] (netlink_recvmsg) [<8065f180>] (sock_recvmsg) [<80660f90>] (___sys_recvmsg) [<80661e7c>] (__sys_recvmsg) [<80661ec0>] (SyS_recvmsg) [<80108b20>] (ret_fast_syscall+0x0/0x1c) agetty listens to netlink messages in order to update the login prompt when IP addresses change (if /etc/issue contains \4 or \6 escape codes): https://git.kernel.org/pub/scm/utils/util-linux/util-linux.git/commit/?id=e… It's a useful feature, though it seems questionable that it causes CAN bit error statistics to be queried. Be that as it may, if hi3110_get_berr_counter() is invoked while a frame is sent by hi3110_hw_tx(), bogus SPI transfers like the following may occur: => 12 00 (hi3110_get_berr_counter() wanted to transmit EC 00 to query the transmit error counter, but the first byte was overwritten by hi3110_hw_tx_frame()) => EA 00 3E 80 01 FB (hi3110_hw_tx_frame() wanted to transmit a frame, but the first byte was overwritten by hi3110_get_berr_counter() because it wanted to query the receive error counter) This sequence hangs the transmission because the driver believes it has sent a frame and waits for the interrupt signaling completion, but in reality the chip has never sent away the frame since the commands it received were malformed. Fix by acquiring the SPI lock in hi3110_get_berr_counter(). I've scrutinized the entire driver for further unlocked SPI accesses but found no others. Cc: Mathias Duckeck <m.duckeck(a)kunbus.de> Cc: Akshay Bhat <akshay.bhat(a)timesys.com> Cc: Casey Fitzpatrick <casey.fitzpatrick(a)timesys.com> Cc: Stef Walter <stefw(a)redhat.com> Cc: Karel Zak <kzak(a)redhat.com> Cc: stable(a)vger.kernel.org # v4.12+ Signed-off-by: Lukas Wunner <lukas(a)wunner.de> Reviewed-by: Akshay Bhat <akshay.bhat(a)timesys.com> Signed-off-by: Marc Kleine-Budde <mkl(a)pengutronix.de> --- drivers/net/can/spi/hi311x.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/net/can/spi/hi311x.c b/drivers/net/can/spi/hi311x.c index 5590c559a8ca..c2cf254e4e95 100644 --- a/drivers/net/can/spi/hi311x.c +++ b/drivers/net/can/spi/hi311x.c @@ -427,8 +427,10 @@ static int hi3110_get_berr_counter(const struct net_device *net, struct hi3110_priv *priv = netdev_priv(net); struct spi_device *spi = priv->spi; + mutex_lock(&priv->hi3110_lock); bec->txerr = hi3110_read(spi, HI3110_READ_TEC); bec->rxerr = hi3110_read(spi, HI3110_READ_REC); + mutex_unlock(&priv->hi3110_lock); return 0; } -- 2.17.0

7 years, 2 months

1
0
0 0

[PATCH] mtd: rawnand: micron: Fix support for on-die ECC

by Boris Brezillon

It looks like the NAND_STATUS_FAIL bit is sticky after an ECC failure, which leads all READ operations following the failing one to report an ECC failure. Reset the chip to clear the NAND_STATUS_FAIL bit. Note that this behavior is not document in the datasheet, but resetting the chip is the only solution we found to fix the problem. Fixes: 9748e1d87573 ("mtd: nand: add support for Micron on-die ECC") Cc: <stable(a)vger.kernel.org> Signed-off-by: Boris Brezillon <boris.brezillon(a)bootlin.com> Cc: Thomas Petazzoni <thomas.petazzoni(a)bootlin.com> Cc: Bean Huo <beanhuo(a)micron.com> Cc: Peter Pan <peterpandong(a)micron.com> --- Peter, Bean, Can you confirm this behavior, or ask someone in Micron who can confirm it? Also, if a RESET is actually needed, it would be good to update the datasheet accordingly. And if that's not the case, can you explain why the NAND_STATUS_FAIL bit is stuck and how to clear it (I tried a 0x00 command, A.K.A. READ STATUS EXIT, but it does not clear this bit, ERASE and PROGRAM seem to clear the bit, but that's clearly not the kind of operation I can do when the user asks for a READ)? Thanks, Boris --- drivers/mtd/nand/raw/nand_micron.c | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) diff --git a/drivers/mtd/nand/raw/nand_micron.c b/drivers/mtd/nand/raw/nand_micron.c index 0af45b134c0c..a915f568f6a3 100644 --- a/drivers/mtd/nand/raw/nand_micron.c +++ b/drivers/mtd/nand/raw/nand_micron.c @@ -153,6 +153,23 @@ micron_nand_read_page_on_die_ecc(struct mtd_info *mtd, struct nand_chip *chip, ret = nand_read_data_op(chip, chip->oob_poi, mtd->oobsize, false); + /* + * Looks like the NAND_STATUS_FAIL bit is sticky after an ECC failure, + * which leads all READ operations following the failing one to report + * an ECC failure. + * Reset the chip to clear it. + * + * Note that this behavior is not document in the datasheet, but + * resetting the chip is the only solution we found to clear this bit. + */ + if (status & NAND_STATUS_FAIL) { + int cs = page >> (chip->chip_shift - chip->page_shift); + + chip->select_chip(mtd, -1); + nand_reset(chip, cs); + chip->select_chip(mtd, cs); + } + out: micron_nand_on_die_ecc_setup(chip, false); -- 2.14.1

7 years, 2 months

2
3
0 0

[PATCH] swiotlb: fix inversed DMA_ATTR_NO_WARN test

by Prasanthi Chellakumar

From: Michel Dänzer <michel.daenzer(a)amd.com> The result was printing the warning only when we were explicitly asked not to. Cc: stable(a)vger.kernel.org Fixes: 0176adb004065d6815a8e67946752df4cd947c5b "swiotlb: refactor coherent buffer allocation" Signed-off-by: Michel Dänzer <michel.daenzer(a)amd.com> Reviewed-by: Christian König <christian.koenig(a)amd.com>. Signed-off-by: Christoph Hellwig <hch(a)lst.de> --- lib/swiotlb.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lib/swiotlb.c b/lib/swiotlb.c index fece575..12fbaa4 100644 --- a/lib/swiotlb.c +++ b/lib/swiotlb.c @@ -737,7 +737,7 @@ swiotlb_alloc_buffer(struct device *dev, size_t size, dma_addr_t *dma_handle, swiotlb_tbl_unmap_single(dev, phys_addr, size, DMA_TO_DEVICE, DMA_ATTR_SKIP_CPU_SYNC); out_warn: - if ((attrs & DMA_ATTR_NO_WARN) && printk_ratelimit()) { + if (!(attrs & DMA_ATTR_NO_WARN) && printk_ratelimit()) { dev_warn(dev, "swiotlb: coherent allocation failed, size=%zu\n", size); -- 2.7.4

7 years, 2 months

1
0
0 0

+ lib-test_bitmapc-fix-bitmap-optimisation-tests-to-report-errors-correctly.patch added to -mm tree

by akpm＠linux-foundation.org

The patch titled Subject: lib/test_bitmap.c: fix bitmap optimisation tests to report errors correctly has been added to the -mm tree. Its filename is lib-test_bitmapc-fix-bitmap-optimisation-tests-to-report-errors-correctly.patch This patch should soon appear at http://ozlabs.org/~akpm/mmots/broken-out/lib-test_bitmapc-fix-bitmap-optimi… and later at http://ozlabs.org/~akpm/mmotm/broken-out/lib-test_bitmapc-fix-bitmap-optimi… Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next and is updated there every 3-4 working days ------------------------------------------------------ From: Matthew Wilcox <mawilcox(a)microsoft.com> Subject: lib/test_bitmap.c: fix bitmap optimisation tests to report errors correctly I had neglected to increment the error counter when the tests failed, which made the tests noisy when they fail, but not actually return an error code. Link: http://lkml.kernel.org/r/20180509114328.9887-1-mpe@ellerman.id.au Fixes: 3cc78125a081 ("lib/test_bitmap.c: add optimisation tests") Signed-off-by: Matthew Wilcox <mawilcox(a)microsoft.com> Signed-off-by: Michael Ellerman <mpe(a)ellerman.id.au> Reported-by: Michael Ellerman <mpe(a)ellerman.id.au> Tested-by: Michael Ellerman <mpe(a)ellerman.id.au> Reviewed-by: Kees Cook <keescook(a)chromium.org> Cc: Yury Norov <ynorov(a)caviumnetworks.com> Cc: Andy Shevchenko <andriy.shevchenko(a)linux.intel.com> Cc: Geert Uytterhoeven <geert(a)linux-m68k.org> Cc: <stable(a)vger.kernel.org> [4.13+] Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- lib/test_bitmap.c | 21 +++++++++++++++------ 1 file changed, 15 insertions(+), 6 deletions(-) diff -puN lib/test_bitmap.c~lib-test_bitmapc-fix-bitmap-optimisation-tests-to-report-errors-correctly lib/test_bitmap.c --- a/lib/test_bitmap.c~lib-test_bitmapc-fix-bitmap-optimisation-tests-to-report-errors-correctly +++ a/lib/test_bitmap.c @@ -331,23 +331,32 @@ static void noinline __init test_mem_opt unsigned int start, nbits; for (start = 0; start < 1024; start += 8) { - memset(bmap1, 0x5a, sizeof(bmap1)); - memset(bmap2, 0x5a, sizeof(bmap2)); for (nbits = 0; nbits < 1024 - start; nbits += 8) { + memset(bmap1, 0x5a, sizeof(bmap1)); + memset(bmap2, 0x5a, sizeof(bmap2)); + bitmap_set(bmap1, start, nbits); __bitmap_set(bmap2, start, nbits); - if (!bitmap_equal(bmap1, bmap2, 1024)) + if (!bitmap_equal(bmap1, bmap2, 1024)) { printk("set not equal %d %d\n", start, nbits); - if (!__bitmap_equal(bmap1, bmap2, 1024)) + failed_tests++; + } + if (!__bitmap_equal(bmap1, bmap2, 1024)) { printk("set not __equal %d %d\n", start, nbits); + failed_tests++; + } bitmap_clear(bmap1, start, nbits); __bitmap_clear(bmap2, start, nbits); - if (!bitmap_equal(bmap1, bmap2, 1024)) + if (!bitmap_equal(bmap1, bmap2, 1024)) { printk("clear not equal %d %d\n", start, nbits); - if (!__bitmap_equal(bmap1, bmap2, 1024)) + failed_tests++; + } + if (!__bitmap_equal(bmap1, bmap2, 1024)) { printk("clear not __equal %d %d\n", start, nbits); + failed_tests++; + } } } } _ Patches currently in -mm which might be from mawilcox(a)microsoft.com are lib-test_bitmapc-fix-bitmap-optimisation-tests-to-report-errors-correctly.patch slab-__gfp_zero-is-incompatible-with-a-constructor.patch ida-remove-simple_ida_lock.patch

7 years, 2 months

1
0
0 0

[PATCH 13/13] x86/pkeys: Do not special case protection key 0

by Dave Hansen

From: Dave Hansen <dave.hansen(a)linux.intel.com> mm_pkey_is_allocated() treats pkey 0 as unallocated. That is inconsistent with the manpages, and also inconsistent with mm->context.pkey_allocation_map. Stop special casing it and only disallow values that are actually bad (< 0). The end-user visible effect of this is that you can now use mprotect_pkey() to set pkey=0. This is a bit nicer than what Ram proposed[1] because it is simpler and removes special-casing for pkey 0. On the other hand, it does allow applications to pkey_free() pkey-0, but that's just a silly thing to do, so we are not going to protect against it. The scenario that could happen is similar to what happens if you free any other pkey that is in use: it might get reallocated later and used to protect some other data. The most likely scenario is that pkey-0 comes back from pkey_alloc(), an access-disable or write-disable bit is set in PKRU for it, and the next stack access will SIGSEGV. It's not horribly different from if you mprotect()'d your stack or heap to be unreadable or unwritable, which is generally very foolish, but also not explicitly prevented by the kernel. 1. http://lkml.kernel.org/r/1522112702-27853-1-git-send-email-linuxram@us.ibm.… Signed-off-by: Dave Hansen <dave.hansen(a)linux.intel.com> Fixes: 58ab9a088dda ("x86/pkeys: Check against max pkey to avoid overflows") Cc: stable(a)vger.kernel.org Cc: Ram Pai <linuxram(a)us.ibm.com> Cc: Thomas Gleixner <tglx(a)linutronix.de> Cc: Dave Hansen <dave.hansen(a)intel.com> Cc: Michael Ellermen <mpe(a)ellerman.id.au> Cc: Ingo Molnar <mingo(a)kernel.org> Cc: Andrew Morton <akpm(a)linux-foundation.org>p Cc: Shuah Khan <shuah(a)kernel.org> --- b/arch/x86/include/asm/mmu_context.h | 2 +- b/arch/x86/include/asm/pkeys.h | 6 +++--- 2 files changed, 4 insertions(+), 4 deletions(-) diff -puN arch/x86/include/asm/mmu_context.h~x86-pkey-0-default-allocated arch/x86/include/asm/mmu_context.h --- a/arch/x86/include/asm/mmu_context.h~x86-pkey-0-default-allocated 2018-05-09 09:20:24.362698393 -0700 +++ b/arch/x86/include/asm/mmu_context.h 2018-05-09 09:20:24.367698393 -0700 @@ -193,7 +193,7 @@ static inline int init_new_context(struc #ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS if (cpu_feature_enabled(X86_FEATURE_OSPKE)) { - /* pkey 0 is the default and always allocated */ + /* pkey 0 is the default and allocated implicitly */ mm->context.pkey_allocation_map = 0x1; /* -1 means unallocated or invalid */ mm->context.execute_only_pkey = -1; diff -puN arch/x86/include/asm/pkeys.h~x86-pkey-0-default-allocated arch/x86/include/asm/pkeys.h --- a/arch/x86/include/asm/pkeys.h~x86-pkey-0-default-allocated 2018-05-09 09:20:24.364698393 -0700 +++ b/arch/x86/include/asm/pkeys.h 2018-05-09 09:20:24.367698393 -0700 @@ -51,10 +51,10 @@ bool mm_pkey_is_allocated(struct mm_stru { /* * "Allocated" pkeys are those that have been returned - * from pkey_alloc(). pkey 0 is special, and never - * returned from pkey_alloc(). + * from pkey_alloc() or pkey 0 which is allocated + * implicitly when the mm is created. */ - if (pkey <= 0) + if (pkey < 0) return false; if (pkey >= arch_max_pkey()) return false; _

7 years, 2 months

1
0
0 0

[PATCH 09/13] x86/pkeys: Override pkey when moving away from PROT_EXEC

by Dave Hansen

From: Dave Hansen <dave.hansen(a)linux.intel.com> I got a bug report that the following code (roughly) was causing a SIGSEGV: mprotect(ptr, size, PROT_EXEC); mprotect(ptr, size, PROT_NONE); mprotect(ptr, size, PROT_READ); *ptr = 100; The problem is hit when the mprotect(PROT_EXEC) is implicitly assigned a protection key to the VMA, and made that key ACCESS_DENY|WRITE_DENY. The PROT_NONE mprotect() failed to remove the protection key, and the PROT_NONE-> PROT_READ left the PTE usable, but the pkey still in place and left the memory inaccessible. To fix this, we ensure that we always "override" the pkee at mprotect() if the VMA does not have execute-only permissions, but the VMA has the execute-only pkey. We had a check for PROT_READ/WRITE, but it did not work for PROT_NONE. This entirely removes the PROT_* checks, which ensures that PROT_NONE now works. Signed-off-by: Dave Hansen <dave.hansen(a)linux.intel.com> Fixes: 62b5f7d013f ("mm/core, x86/mm/pkeys: Add execute-only protection keys support") Reported-by: Shakeel Butt <shakeelb(a)google.com> Cc: stable(a)vger.kernel.org Cc: Ram Pai <linuxram(a)us.ibm.com> Cc: Thomas Gleixner <tglx(a)linutronix.de> Cc: Dave Hansen <dave.hansen(a)intel.com> Cc: Michael Ellermen <mpe(a)ellerman.id.au> Cc: Ingo Molnar <mingo(a)kernel.org> Cc: Andrew Morton <akpm(a)linux-foundation.org> Cc: Shuah Khan <shuah(a)kernel.org> --- b/arch/x86/include/asm/pkeys.h | 12 +++++++++++- b/arch/x86/mm/pkeys.c | 21 +++++++++++---------- 2 files changed, 22 insertions(+), 11 deletions(-) diff -puN arch/x86/include/asm/pkeys.h~pkeys-abandon-exec-only-pkey-more-aggressively arch/x86/include/asm/pkeys.h --- a/arch/x86/include/asm/pkeys.h~pkeys-abandon-exec-only-pkey-more-aggressively 2018-05-09 09:20:22.295698398 -0700 +++ b/arch/x86/include/asm/pkeys.h 2018-05-09 09:20:22.300698398 -0700 @@ -2,6 +2,8 @@ #ifndef _ASM_X86_PKEYS_H #define _ASM_X86_PKEYS_H +#define ARCH_DEFAULT_PKEY 0 + #define arch_max_pkey() (boot_cpu_has(X86_FEATURE_OSPKE) ? 16 : 1) extern int arch_set_user_pkey_access(struct task_struct *tsk, int pkey, @@ -15,7 +17,7 @@ extern int __execute_only_pkey(struct mm static inline int execute_only_pkey(struct mm_struct *mm) { if (!boot_cpu_has(X86_FEATURE_OSPKE)) - return 0; + return ARCH_DEFAULT_PKEY; return __execute_only_pkey(mm); } @@ -56,6 +58,14 @@ bool mm_pkey_is_allocated(struct mm_stru return false; if (pkey >= arch_max_pkey()) return false; + /* + * The exec-only pkey is set in the allocation map, but + * is not available to any of the user interfaces like + * mprotect_pkey(). + */ + if (pkey == mm->context.execute_only_pkey) + return false; + return mm_pkey_allocation_map(mm) & (1U << pkey); } diff -puN arch/x86/mm/pkeys.c~pkeys-abandon-exec-only-pkey-more-aggressively arch/x86/mm/pkeys.c --- a/arch/x86/mm/pkeys.c~pkeys-abandon-exec-only-pkey-more-aggressively 2018-05-09 09:20:22.297698398 -0700 +++ b/arch/x86/mm/pkeys.c 2018-05-09 09:20:22.301698398 -0700 @@ -94,26 +94,27 @@ int __arch_override_mprotect_pkey(struct */ if (pkey != -1) return pkey; - /* - * Look for a protection-key-drive execute-only mapping - * which is now being given permissions that are not - * execute-only. Move it back to the default pkey. - */ - if (vma_is_pkey_exec_only(vma) && - (prot & (PROT_READ|PROT_WRITE))) { - return 0; - } + /* * The mapping is execute-only. Go try to get the * execute-only protection key. If we fail to do that, * fall through as if we do not have execute-only - * support. + * support in this mm. */ if (prot == PROT_EXEC) { pkey = execute_only_pkey(vma->vm_mm); if (pkey > 0) return pkey; + } else if (vma_is_pkey_exec_only(vma)) { + /* + * Protections are *not* PROT_EXEC, but the mapping + * is using the exec-only pkey. This mapping was + * PROT_EXEC and will no longer be. Move back to + * the default pkey. + */ + return ARCH_DEFAULT_PKEY; } + /* * This is a vanilla, non-pkey mprotect (or we failed to * setup execute-only), inherit the pkey from the VMA we _

7 years, 2 months

1
0
0 0

v4.9.99 build: 0 failures 2 warnings (v4.9.99)

by Build bot for Mark Brown

Tree/Branch: v4.9.99 Git describe: v4.9.99 Commit: 04cd74a759 Linux 4.9.99 Build Time: 83 min 17 sec Passed: 11 / 11 (100.00 %) Failed: 0 / 11 ( 0.00 %) Errors: 0 Warnings: 2 Section Mismatches: 0 ------------------------------------------------------------------------------- defconfigs with issues (other than build errors): 1 warnings 0 mismatches : arm64-allmodconfig 1 warnings 0 mismatches : x86_64-allmodconfig ------------------------------------------------------------------------------- Warnings Summary: 2 1 drivers/target/iscsi/iscsi_target.o: warning: objtool: iscsit_handle_task_mgt_cmd()+0x78b: sibling call from callable instruction with changed frame pointer 1 ../include/linux/sched.h:2349:56: warning: 'noio_flag' may be used uninitialized in this function [-Wmaybe-uninitialized] =============================================================================== Detailed per-defconfig build reports below: ------------------------------------------------------------------------------- arm64-allmodconfig : PASS, 0 errors, 1 warnings, 0 section mismatches Warnings: ../include/linux/sched.h:2349:56: warning: 'noio_flag' may be used uninitialized in this function [-Wmaybe-uninitialized] ------------------------------------------------------------------------------- x86_64-allmodconfig : PASS, 0 errors, 1 warnings, 0 section mismatches Warnings: drivers/target/iscsi/iscsi_target.o: warning: objtool: iscsit_handle_task_mgt_cmd()+0x78b: sibling call from callable instruction with changed frame pointer ------------------------------------------------------------------------------- Passed with no errors, warnings or mismatches: arm64-allnoconfig arm-multi_v5_defconfig arm-multi_v7_defconfig x86_64-defconfig arm-allmodconfig arm-allnoconfig x86_64-allnoconfig arm-multi_v4t_defconfig arm64-defconfig close failed in file object destructor: sys.excepthook is missing lost sys.stderr

7 years, 2 months

1
0
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-stable-mirror May 2018