Linux-stable-mirror December 2018

linux-stable-mirror@lists.linaro.org

310 participants
844 discussions

[PATCH] hwpoison, memory_hotplug: allow hwpoisoned pages to be offlined

by Michal Hocko

From: Michal Hocko <mhocko(a)suse.com> We have received a bug report that an injected MCE about faulty memory prevents memory offline to succeed on 4.4 base kernel. The underlying reason was that the HWPoison page has an elevated reference count and the migration keeps failing. There are two problems with that. First of all it is dubious to migrate the poisoned page because we know that accessing that memory is possible to fail. Secondly it doesn't make any sense to migrate a potentially broken content and preserve the memory corruption over to a new location. Oscar has found out that 4.4 and the current upstream kernels behave slightly differently with his simply testcase === int main(void) { int ret; int i; int fd; char *array = malloc(4096); char *array_locked = malloc(4096); fd = open("/tmp/data", O_RDONLY); read(fd, array, 4095); for (i = 0; i < 4096; i++) array_locked[i] = 'd'; ret = mlock((void *)PAGE_ALIGN((unsigned long)array_locked), sizeof(array_locked)); if (ret) perror("mlock"); sleep (20); ret = madvise((void *)PAGE_ALIGN((unsigned long)array_locked), 4096, MADV_HWPOISON); if (ret) perror("madvise"); for (i = 0; i < 4096; i++) array_locked[i] = 'd'; return 0; } === + offline this memory. In 4.4 kernels he saw the hwpoisoned page to be returned back to the LRU list kernel: [<ffffffff81019ac9>] dump_trace+0x59/0x340 kernel: [<ffffffff81019e9a>] show_stack_log_lvl+0xea/0x170 kernel: [<ffffffff8101ac71>] show_stack+0x21/0x40 kernel: [<ffffffff8132bb90>] dump_stack+0x5c/0x7c kernel: [<ffffffff810815a1>] warn_slowpath_common+0x81/0xb0 kernel: [<ffffffff811a275c>] __pagevec_lru_add_fn+0x14c/0x160 kernel: [<ffffffff811a2eed>] pagevec_lru_move_fn+0xad/0x100 kernel: [<ffffffff811a334c>] __lru_cache_add+0x6c/0xb0 kernel: [<ffffffff81195236>] add_to_page_cache_lru+0x46/0x70 kernel: [<ffffffffa02b4373>] extent_readpages+0xc3/0x1a0 [btrfs] kernel: [<ffffffff811a16d7>] __do_page_cache_readahead+0x177/0x200 kernel: [<ffffffff811a18c8>] ondemand_readahead+0x168/0x2a0 kernel: [<ffffffff8119673f>] generic_file_read_iter+0x41f/0x660 kernel: [<ffffffff8120e50d>] __vfs_read+0xcd/0x140 kernel: [<ffffffff8120e9ea>] vfs_read+0x7a/0x120 kernel: [<ffffffff8121404b>] kernel_read+0x3b/0x50 kernel: [<ffffffff81215c80>] do_execveat_common.isra.29+0x490/0x6f0 kernel: [<ffffffff81215f08>] do_execve+0x28/0x30 kernel: [<ffffffff81095ddb>] call_usermodehelper_exec_async+0xfb/0x130 kernel: [<ffffffff8161c045>] ret_from_fork+0x55/0x80 And that later confuses the hotremove path because an LRU page is attempted to be migrated and that fails due to an elevated reference count. It is quite possible that the reuse of the HWPoisoned page is some kind of fixed race condition but I am not really sure about that. With the upstream kernel the failure is slightly different. The page doesn't seem to have LRU bit set but isolate_movable_page simply fails and do_migrate_range simply puts all the isolated pages back to LRU and therefore no progress is made and scan_movable_pages finds same set of pages over and over again. Fix both cases by explicitly checking HWPoisoned pages before we even try to get a reference on the page, try to unmap it if it is still mapped. As explained by Naoya : Hwpoison code never unmapped those for no big reason because : Ksm pages never dominate memory, so we simply didn't have strong : motivation to save the pages. Also put WARN_ON(PageLRU) in case there is a race and we can hit LRU HWPoison pages which shouldn't happen but I couldn't convince myself about that. Naoya has noted the following : Theoretically no such gurantee, because try_to_unmap() doesn't have a : guarantee of success and then memory_failure() returns immediately : when hwpoison_user_mappings fails. : Or the following code (comes after hwpoison_user_mappings block) also impli= : es : that the target page can still have PageLRU flag. : : /* : * Torn down by someone else? : */ : if (PageLRU(p) && !PageSwapCache(p) && p->mapping =3D=3D NULL) { : action_result(pfn, MF_MSG_TRUNCATED_LRU, MF_IGNORED); : res =3D -EBUSY; : goto out; : } : : So I think it's OK to keep "if (WARN_ON(PageLRU(page)))" block in : current version of your patch. Debugged-by: Oscar Salvador <osalvador(a)suse.com> Cc: stable Reviewed-by: Oscar Salvador <osalvador(a)suse.com> Tested-by: Oscar Salvador <osalvador(a)suse.com> Acked-by: David Hildenbrand <david(a)redhat.com> Acked-by: Naoya Horiguchi <n-horiguchi(a)ah.jp.nec.com> Signed-off-by: Michal Hocko <mhocko(a)suse.com> --- Hi Andrew, this has been posted as an RFC [1] previously. It took 2 versions to get the patch right but it seems that this one should work reasonably well. I guess we want to have it in linux-next for some time but I do not expect many people do test MCEs + hotremove considering the breakage is old and nobody has noticed so far. [1] http://lkml.kernel.org/r/20181203100309.14784-1-mhocko@kernel.org mm/memory_hotplug.c | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index c6c42a7425e5..cfa1a2736876 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -34,6 +34,7 @@ #include <linux/hugetlb.h> #include <linux/memblock.h> #include <linux/compaction.h> +#include <linux/rmap.h> #include <asm/tlbflush.h> @@ -1366,6 +1367,21 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn) pfn = page_to_pfn(compound_head(page)) + hpage_nr_pages(page) - 1; + /* + * HWPoison pages have elevated reference counts so the migration would + * fail on them. It also doesn't make any sense to migrate them in the + * first place. Still try to unmap such a page in case it is still mapped + * (e.g. current hwpoison implementation doesn't unmap KSM pages but keep + * the unmap as the catch all safety net). + */ + if (PageHWPoison(page)) { + if (WARN_ON(PageLRU(page))) + isolate_lru_page(page); + if (page_mapped(page)) + try_to_unmap(page, TTU_IGNORE_MLOCK | TTU_IGNORE_ACCESS); + continue; + } + if (!get_page_unless_zero(page)) continue; /* -- 2.19.2

7 years

patch "xhci: Prevent U1/U2 link pm states if exit latency is too long" added to usb-linus

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled xhci: Prevent U1/U2 link pm states if exit latency is too long to my usb git tree which can be found at git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git in the usb-linus branch. The patch will show up in the next release of the linux-next tree (usually sometime within the next 24 hours during the week.) The patch will hopefully also be merged in Linus's tree for the next -rc kernel release. If you have any questions about this process, please let me know. >From 0472bf06c6fd33c1a18aaead4c8f91e5a03d8d7b Mon Sep 17 00:00:00 2001 From: Mathias Nyman <mathias.nyman(a)linux.intel.com> Date: Wed, 5 Dec 2018 14:22:39 +0200 Subject: xhci: Prevent U1/U2 link pm states if exit latency is too long Don't allow USB3 U1 or U2 if the latency to wake up from the U-state reaches the service interval for a periodic endpoint. This is according to xhci 1.1 specification section 4.23.5.2 extra note: "Software shall ensure that a device is prevented from entering a U-state where its worst case exit latency approaches the ESIT." Allowing too long exit latencies for periodic endpoint confuses xHC internal scheduling, and new devices may fail to enumerate with a "Not enough bandwidth for new device state" error from the host. Cc: <stable(a)vger.kernel.org> Signed-off-by: Mathias Nyman <mathias.nyman(a)linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- drivers/usb/host/xhci.c | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c index c20b85e28d81..dae3be1b9c8f 100644 --- a/drivers/usb/host/xhci.c +++ b/drivers/usb/host/xhci.c @@ -4514,6 +4514,14 @@ static u16 xhci_calculate_u1_timeout(struct xhci_hcd *xhci, { unsigned long long timeout_ns; + /* Prevent U1 if service interval is shorter than U1 exit latency */ + if (usb_endpoint_xfer_int(desc) || usb_endpoint_xfer_isoc(desc)) { + if (xhci_service_interval_to_ns(desc) <= udev->u1_params.mel) { + dev_dbg(&udev->dev, "Disable U1, ESIT shorter than exit latency\n"); + return USB3_LPM_DISABLED; + } + } + if (xhci->quirks & XHCI_INTEL_HOST) timeout_ns = xhci_calculate_intel_u1_timeout(udev, desc); else @@ -4570,6 +4578,14 @@ static u16 xhci_calculate_u2_timeout(struct xhci_hcd *xhci, { unsigned long long timeout_ns; + /* Prevent U2 if service interval is shorter than U2 exit latency */ + if (usb_endpoint_xfer_int(desc) || usb_endpoint_xfer_isoc(desc)) { + if (xhci_service_interval_to_ns(desc) <= udev->u2_params.mel) { + dev_dbg(&udev->dev, "Disable U2, ESIT shorter than exit latency\n"); + return USB3_LPM_DISABLED; + } + } + if (xhci->quirks & XHCI_INTEL_HOST) timeout_ns = xhci_calculate_intel_u2_timeout(udev, desc); else -- 2.19.2

7 years

patch "xhci: workaround CSS timeout on AMD SNPS 3.0 xHC" added to usb-linus

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled xhci: workaround CSS timeout on AMD SNPS 3.0 xHC to my usb git tree which can be found at git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git in the usb-linus branch. The patch will show up in the next release of the linux-next tree (usually sometime within the next 24 hours during the week.) The patch will hopefully also be merged in Linus's tree for the next -rc kernel release. If you have any questions about this process, please let me know. >From a7d57abcc8a5bdeb53bbf8e87558e8e0a2c2a29d Mon Sep 17 00:00:00 2001 From: Sandeep Singh <sandeep.singh(a)amd.com> Date: Wed, 5 Dec 2018 14:22:38 +0200 Subject: xhci: workaround CSS timeout on AMD SNPS 3.0 xHC Occasionally AMD SNPS 3.0 xHC does not respond to CSS when set, also it does not flag anything on SRE and HCE to point the internal xHC errors on USBSTS register. This stalls the entire system wide suspend and there is no point in stalling just because of xHC CSS is not responding. To work around this problem, if the xHC does not flag anything on SRE and HCE, we can skip the CSS timeout and allow the system to continue the suspend. Once the system resume happens we can internally reset the controller using XHCI_RESET_ON_RESUME quirk Signed-off-by: Shyam Sundar S K <Shyam-sundar.S-k(a)amd.com> Signed-off-by: Sandeep Singh <Sandeep.Singh(a)amd.com> cc: Nehal Shah <Nehal-bakulchandra.Shah(a)amd.com> Cc: <stable(a)vger.kernel.org> Tested-by: Kai-Heng Feng <kai.heng.feng(a)canonical.com> Signed-off-by: Mathias Nyman <mathias.nyman(a)linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- drivers/usb/host/xhci-pci.c | 4 ++++ drivers/usb/host/xhci.c | 26 ++++++++++++++++++++++---- drivers/usb/host/xhci.h | 3 +++ 3 files changed, 29 insertions(+), 4 deletions(-) diff --git a/drivers/usb/host/xhci-pci.c b/drivers/usb/host/xhci-pci.c index a9515265db4d..a9ec7051f286 100644 --- a/drivers/usb/host/xhci-pci.c +++ b/drivers/usb/host/xhci-pci.c @@ -139,6 +139,10 @@ static void xhci_pci_quirks(struct device *dev, struct xhci_hcd *xhci) pdev->device == 0x43bb)) xhci->quirks |= XHCI_SUSPEND_DELAY; + if (pdev->vendor == PCI_VENDOR_ID_AMD && + (pdev->device == 0x15e0 || pdev->device == 0x15e1)) + xhci->quirks |= XHCI_SNPS_BROKEN_SUSPEND; + if (pdev->vendor == PCI_VENDOR_ID_AMD) xhci->quirks |= XHCI_TRUST_TX_LENGTH; diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c index c928dbbff881..c20b85e28d81 100644 --- a/drivers/usb/host/xhci.c +++ b/drivers/usb/host/xhci.c @@ -968,6 +968,7 @@ int xhci_suspend(struct xhci_hcd *xhci, bool do_wakeup) unsigned int delay = XHCI_MAX_HALT_USEC; struct usb_hcd *hcd = xhci_to_hcd(xhci); u32 command; + u32 res; if (!hcd->state) return 0; @@ -1021,11 +1022,28 @@ int xhci_suspend(struct xhci_hcd *xhci, bool do_wakeup) command = readl(&xhci->op_regs->command); command |= CMD_CSS; writel(command, &xhci->op_regs->command); + xhci->broken_suspend = 0; if (xhci_handshake(&xhci->op_regs->status, STS_SAVE, 0, 10 * 1000)) { - xhci_warn(xhci, "WARN: xHC save state timeout\n"); - spin_unlock_irq(&xhci->lock); - return -ETIMEDOUT; + /* + * AMD SNPS xHC 3.0 occasionally does not clear the + * SSS bit of USBSTS and when driver tries to poll + * to see if the xHC clears BIT(8) which never happens + * and driver assumes that controller is not responding + * and times out. To workaround this, its good to check + * if SRE and HCE bits are not set (as per xhci + * Section 5.4.2) and bypass the timeout. + */ + res = readl(&xhci->op_regs->status); + if ((xhci->quirks & XHCI_SNPS_BROKEN_SUSPEND) && + (((res & STS_SRE) == 0) && + ((res & STS_HCE) == 0))) { + xhci->broken_suspend = 1; + } else { + xhci_warn(xhci, "WARN: xHC save state timeout\n"); + spin_unlock_irq(&xhci->lock); + return -ETIMEDOUT; + } } spin_unlock_irq(&xhci->lock); @@ -1078,7 +1096,7 @@ int xhci_resume(struct xhci_hcd *xhci, bool hibernated) set_bit(HCD_FLAG_HW_ACCESSIBLE, &xhci->shared_hcd->flags); spin_lock_irq(&xhci->lock); - if (xhci->quirks & XHCI_RESET_ON_RESUME) + if ((xhci->quirks & XHCI_RESET_ON_RESUME) || xhci->broken_suspend) hibernated = true; if (!hibernated) { diff --git a/drivers/usb/host/xhci.h b/drivers/usb/host/xhci.h index 260b259b72bc..c3515bad5dbb 100644 --- a/drivers/usb/host/xhci.h +++ b/drivers/usb/host/xhci.h @@ -1850,6 +1850,7 @@ struct xhci_hcd { #define XHCI_ZERO_64B_REGS BIT_ULL(32) #define XHCI_DEFAULT_PM_RUNTIME_ALLOW BIT_ULL(33) #define XHCI_RESET_PLL_ON_DISCONNECT BIT_ULL(34) +#define XHCI_SNPS_BROKEN_SUSPEND BIT_ULL(35) unsigned int num_active_eps; unsigned int limit_active_eps; @@ -1879,6 +1880,8 @@ struct xhci_hcd { void *dbc; /* platform-specific data -- must come last */ unsigned long priv[0] __aligned(sizeof(s64)); + /* Broken Suspend flag for SNPS Suspend resume issue */ + u8 broken_suspend; }; /* Platform specific overrides to generic XHCI hc_driver ops */ -- 2.19.2

7 years

FAILED: patch "[PATCH] s390/zcrypt: reinit ap queue state machine during device" failed to apply to 4.19-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.19-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From 104f708fd1241b22f808bdf066ab67dc5a051de5 Mon Sep 17 00:00:00 2001 From: Harald Freudenberger <freude(a)linux.ibm.com> Date: Fri, 9 Nov 2018 14:59:24 +0100 Subject: [PATCH] s390/zcrypt: reinit ap queue state machine during device probe Until the vfio-ap driver came into live there was a well known agreement about the way how ap devices are initialized and their states when the driver's probe function is called. However, the vfio device driver when receiving an ap queue device does additional resets thereby removing the registration for interrupts for the ap device done by the ap bus core code. So when later the vfio driver releases the device and one of the default zcrypt drivers takes care of the device the interrupt registration needs to get renewed. The current code does no renew and result is that requests send into such a queue will never see a reply processed - the application hangs. This patch adds a function which resets the aq queue state machine for the ap queue device and triggers the walk through the initial states (which are reset and registration for interrupts). This function is now called before the driver's probe function is invoked. When the association between driver and device is released, the driver's remove function is called. The current implementation calls a ap queue function ap_queue_remove(). This invokation has been moved to the ap bus function to make the probe / remove pair for ap bus and drivers more symmetric. Fixes: 7e0bdbe5c21c ("s390/zcrypt: AP bus support for alternate driver(s)") Cc: stable(a)vger.kernel.org # 4.19+ Signed-off-by: Harald Freudenberger <freude(a)linux.ibm.com> Reviewd-by: Tony Krowiak <akrowiak(a)linux.ibm.com> Reviewd-by: Martin Schwidefsky <schwidefsky(a)de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky(a)de.ibm.com> diff --git a/drivers/s390/crypto/ap_bus.c b/drivers/s390/crypto/ap_bus.c index 048665e4f13d..9f5a201c4c87 100644 --- a/drivers/s390/crypto/ap_bus.c +++ b/drivers/s390/crypto/ap_bus.c @@ -775,6 +775,8 @@ static int ap_device_probe(struct device *dev) drvres = ap_drv->flags & AP_DRIVER_FLAG_DEFAULT; if (!!devres != !!drvres) return -ENODEV; + /* (re-)init queue's state machine */ + ap_queue_reinit_state(to_ap_queue(dev)); } /* Add queue/card to list of active queues/cards */ @@ -807,6 +809,8 @@ static int ap_device_remove(struct device *dev) struct ap_device *ap_dev = to_ap_dev(dev); struct ap_driver *ap_drv = ap_dev->drv; + if (is_queue_dev(dev)) + ap_queue_remove(to_ap_queue(dev)); if (ap_drv->remove) ap_drv->remove(ap_dev); @@ -1444,10 +1448,6 @@ static void ap_scan_bus(struct work_struct *unused) aq->ap_dev.device.parent = &ac->ap_dev.device; dev_set_name(&aq->ap_dev.device, "%02x.%04x", id, dom); - /* Start with a device reset */ - spin_lock_bh(&aq->lock); - ap_wait(ap_sm_event(aq, AP_EVENT_POLL)); - spin_unlock_bh(&aq->lock); /* Register device */ rc = device_register(&aq->ap_dev.device); if (rc) { diff --git a/drivers/s390/crypto/ap_bus.h b/drivers/s390/crypto/ap_bus.h index 3eed1b36c876..bfc66e4a9de1 100644 --- a/drivers/s390/crypto/ap_bus.h +++ b/drivers/s390/crypto/ap_bus.h @@ -254,6 +254,7 @@ struct ap_queue *ap_queue_create(ap_qid_t qid, int device_type); void ap_queue_remove(struct ap_queue *aq); void ap_queue_suspend(struct ap_device *ap_dev); void ap_queue_resume(struct ap_device *ap_dev); +void ap_queue_reinit_state(struct ap_queue *aq); struct ap_card *ap_card_create(int id, int queue_depth, int raw_device_type, int comp_device_type, unsigned int functions); diff --git a/drivers/s390/crypto/ap_queue.c b/drivers/s390/crypto/ap_queue.c index 66f7334bcb03..0aa4b3ccc948 100644 --- a/drivers/s390/crypto/ap_queue.c +++ b/drivers/s390/crypto/ap_queue.c @@ -718,5 +718,20 @@ void ap_queue_remove(struct ap_queue *aq) { ap_flush_queue(aq); del_timer_sync(&aq->timeout); + + /* reset with zero, also clears irq registration */ + spin_lock_bh(&aq->lock); + ap_zapq(aq->qid); + aq->state = AP_STATE_BORKED; + spin_unlock_bh(&aq->lock); } EXPORT_SYMBOL(ap_queue_remove); + +void ap_queue_reinit_state(struct ap_queue *aq) +{ + spin_lock_bh(&aq->lock); + aq->state = AP_STATE_RESET_START; + ap_wait(ap_sm_event(aq, AP_EVENT_POLL)); + spin_unlock_bh(&aq->lock); +} +EXPORT_SYMBOL(ap_queue_reinit_state); diff --git a/drivers/s390/crypto/zcrypt_cex2a.c b/drivers/s390/crypto/zcrypt_cex2a.c index 146f54f5cbb8..c50f3e86cc74 100644 --- a/drivers/s390/crypto/zcrypt_cex2a.c +++ b/drivers/s390/crypto/zcrypt_cex2a.c @@ -196,7 +196,6 @@ static void zcrypt_cex2a_queue_remove(struct ap_device *ap_dev) struct ap_queue *aq = to_ap_queue(&ap_dev->device); struct zcrypt_queue *zq = aq->private; - ap_queue_remove(aq); if (zq) zcrypt_queue_unregister(zq); } diff --git a/drivers/s390/crypto/zcrypt_cex2c.c b/drivers/s390/crypto/zcrypt_cex2c.c index 546f67676734..35c7c6672713 100644 --- a/drivers/s390/crypto/zcrypt_cex2c.c +++ b/drivers/s390/crypto/zcrypt_cex2c.c @@ -251,7 +251,6 @@ static void zcrypt_cex2c_queue_remove(struct ap_device *ap_dev) struct ap_queue *aq = to_ap_queue(&ap_dev->device); struct zcrypt_queue *zq = aq->private; - ap_queue_remove(aq); if (zq) zcrypt_queue_unregister(zq); } diff --git a/drivers/s390/crypto/zcrypt_cex4.c b/drivers/s390/crypto/zcrypt_cex4.c index f9d4c6c7521d..582ffa7e0f18 100644 --- a/drivers/s390/crypto/zcrypt_cex4.c +++ b/drivers/s390/crypto/zcrypt_cex4.c @@ -275,7 +275,6 @@ static void zcrypt_cex4_queue_remove(struct ap_device *ap_dev) struct ap_queue *aq = to_ap_queue(&ap_dev->device); struct zcrypt_queue *zq = aq->private; - ap_queue_remove(aq); if (zq) zcrypt_queue_unregister(zq); }

7 years, 1 month

[PATCH] mm: hide incomplete nr_indirectly_reclaimable in /proc/zoneinfo

by Roman Gushchin

Yongqin reported that /proc/zoneinfo format is broken in 4.14 due to commit 7aaf77272358 ("mm: don't show nr_indirectly_reclaimable in /proc/vmstat") Node 0, zone DMA per-node stats nr_inactive_anon 403 nr_active_anon 89123 nr_inactive_file 128887 nr_active_file 47377 nr_unevictable 2053 nr_slab_reclaimable 7510 nr_slab_unreclaimable 10775 nr_isolated_anon 0 nr_isolated_file 0 <...> nr_vmscan_write 0 nr_vmscan_immediate_reclaim 0 nr_dirtied 6022 nr_written 5985 74240 ^^^^^^^^^^ pages free 131656 The problem is caused by the nr_indirectly_reclaimable counter, which is hidden from the /proc/vmstat, but not from the /proc/zoneinfo. Let's fix this inconsistency and hide the counter from /proc/zoneinfo exactly as from /proc/vmstat. BTW, in 4.19+ the counter has been renamed and exported by the commit b29940c1abd7 ("mm: rename and change semantics of nr_indirectly_reclaimable_bytes"), so there is no such a problem anymore. Cc: <stable(a)vger.kernel.org> # 4.14.x-4.18.x Fixes: 7aaf77272358 ("mm: don't show nr_indirectly_reclaimable in /proc/vmstat") Reported-by: Yongqin Liu <yongqin.liu(a)linaro.org> Signed-off-by: Roman Gushchin <guro(a)fb.com> Cc: Vlastimil Babka <vbabka(a)suse.cz> Cc: Andrew Morton <akpm(a)linux-foundation.org> --- mm/vmstat.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/mm/vmstat.c b/mm/vmstat.c index 527ae727d547..6389e876c7a7 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -1500,6 +1500,10 @@ static void zoneinfo_show_print(struct seq_file *m, pg_data_t *pgdat, if (is_zone_first_populated(pgdat, zone)) { seq_printf(m, "\n per-node stats"); for (i = 0; i < NR_VM_NODE_STAT_ITEMS; i++) { + /* Skip hidden vmstat items. */ + if (*vmstat_text[i + NR_VM_ZONE_STAT_ITEMS + + NR_VM_NUMA_STAT_ITEMS] == '\0') + continue; seq_printf(m, "\n %-12s %lu", vmstat_text[i + NR_VM_ZONE_STAT_ITEMS + NR_VM_NUMA_STAT_ITEMS], -- 2.17.2

7 years, 1 month

stable request: mm: mlock: avoid increase mm->locked_vm on mlock() when already mlock2(,MLOCK_ONFAULT)

by Rafael David Tinoco

Hello Greg, Could you please consider backporting to v4.4 the following commit: commit b5b5b6fe643391209b08528bef410e0cf299b826 Author: Simon Guo <wei.guo.simon(a)gmail.com> Date: Fri Oct 7 20:59:40 2016 mm: mlock: avoid increase mm->locked_vm on mlock() when already mlock2(,MLOCK_ONFAULT) It seems to be a trivial fix for: https://bugs.linaro.org/show_bug.cgi?id=4043 (mlock203.c LTP test failing on v4.4) Thanks in advance, -- Rafael D. Tinoco Linaro Kernel Validation Team

7 years, 1 month

[PATCH] ubi: fastmap: Check each mapping only once

by Martin Kepplinger

From: Richard Weinberger <richard(a)nod.at> [ Upstream commit 34653fd8c46e771585fce5975e4243f8fd401914 ] This commit got merged along with commit 781932375ffc ("ubi: fastmap: Correctly handle interrupted erasures in EBA") upstream but only the latter has been applied to stable v4.14.54 as commit a23cf10d9abb. This resulted in a performance regression. Startup on i.MX platforms is delayed for up to a few seconds depending on the platform. This fixes ubi fastmap to be of the same performance as it has been before said fastmap changes. Fixes: a23cf10d9abb ("ubi: fastmap: Correctly handle interrupted erasures in EBA") Signed-off-by: Richard Weinberger <richard(a)nod.at> Signed-off-by: Martin Kepplinger <martin.kepplinger(a)ginzinger.com> --- Richard, although this fixes a major slowdown regression in -stable, do you consider this "stable" too? This applies and is tested only for the 4.14 stable tree. It seems to be equally relevant for 4.9 and 4.4 though. thanks martin drivers/mtd/ubi/build.c | 1 + drivers/mtd/ubi/eba.c | 4 ++++ drivers/mtd/ubi/fastmap.c | 20 ++++++++++++++++++++ drivers/mtd/ubi/ubi.h | 11 +++++++++++ drivers/mtd/ubi/vmt.c | 1 + drivers/mtd/ubi/vtbl.c | 16 +++++++++++++++- 6 files changed, 52 insertions(+), 1 deletion(-) diff --git a/drivers/mtd/ubi/build.c b/drivers/mtd/ubi/build.c index 18a72da759a0..6445c693d935 100644 --- a/drivers/mtd/ubi/build.c +++ b/drivers/mtd/ubi/build.c @@ -526,6 +526,7 @@ void ubi_free_internal_volumes(struct ubi_device *ubi) for (i = ubi->vtbl_slots; i < ubi->vtbl_slots + UBI_INT_VOL_COUNT; i++) { ubi_eba_replace_table(ubi->volumes[i], NULL); + ubi_fastmap_destroy_checkmap(ubi->volumes[i]); kfree(ubi->volumes[i]); } } diff --git a/drivers/mtd/ubi/eba.c b/drivers/mtd/ubi/eba.c index d0884bd9d955..c4d4b8f07630 100644 --- a/drivers/mtd/ubi/eba.c +++ b/drivers/mtd/ubi/eba.c @@ -517,6 +517,9 @@ static int check_mapping(struct ubi_device *ubi, struct ubi_volume *vol, int lnu if (!ubi->fast_attach) return 0; + if (!vol->checkmap || test_bit(lnum, vol->checkmap)) + return 0; + vidb = ubi_alloc_vid_buf(ubi, GFP_NOFS); if (!vidb) return -ENOMEM; @@ -551,6 +554,7 @@ static int check_mapping(struct ubi_device *ubi, struct ubi_volume *vol, int lnu goto out_free; } + set_bit(lnum, vol->checkmap); err = 0; out_free: diff --git a/drivers/mtd/ubi/fastmap.c b/drivers/mtd/ubi/fastmap.c index 5a832bc79b1b..63e8527f7b65 100644 --- a/drivers/mtd/ubi/fastmap.c +++ b/drivers/mtd/ubi/fastmap.c @@ -1101,6 +1101,26 @@ int ubi_scan_fastmap(struct ubi_device *ubi, struct ubi_attach_info *ai, goto out; } +int ubi_fastmap_init_checkmap(struct ubi_volume *vol, int leb_count) +{ + struct ubi_device *ubi = vol->ubi; + + if (!ubi->fast_attach) + return 0; + + vol->checkmap = kcalloc(BITS_TO_LONGS(leb_count), sizeof(unsigned long), + GFP_KERNEL); + if (!vol->checkmap) + return -ENOMEM; + + return 0; +} + +void ubi_fastmap_destroy_checkmap(struct ubi_volume *vol) +{ + kfree(vol->checkmap); +} + /** * ubi_write_fastmap - writes a fastmap. * @ubi: UBI device object diff --git a/drivers/mtd/ubi/ubi.h b/drivers/mtd/ubi/ubi.h index 5fe62653995e..f5ba97c46160 100644 --- a/drivers/mtd/ubi/ubi.h +++ b/drivers/mtd/ubi/ubi.h @@ -334,6 +334,9 @@ struct ubi_eba_leb_desc { * @changing_leb: %1 if the atomic LEB change ioctl command is in progress * @direct_writes: %1 if direct writes are enabled for this volume * + * @checkmap: bitmap to remember which PEB->LEB mappings got checked, + * protected by UBI LEB lock tree. + * * The @corrupted field indicates that the volume's contents is corrupted. * Since UBI protects only static volumes, this field is not relevant to * dynamic volumes - it is user's responsibility to assure their data @@ -377,6 +380,10 @@ struct ubi_volume { unsigned int updating:1; unsigned int changing_leb:1; unsigned int direct_writes:1; + +#ifdef CONFIG_MTD_UBI_FASTMAP + unsigned long *checkmap; +#endif }; /** @@ -965,8 +972,12 @@ size_t ubi_calc_fm_size(struct ubi_device *ubi); int ubi_update_fastmap(struct ubi_device *ubi); int ubi_scan_fastmap(struct ubi_device *ubi, struct ubi_attach_info *ai, struct ubi_attach_info *scan_ai); +int ubi_fastmap_init_checkmap(struct ubi_volume *vol, int leb_count); +void ubi_fastmap_destroy_checkmap(struct ubi_volume *vol); #else static inline int ubi_update_fastmap(struct ubi_device *ubi) { return 0; } +int static inline ubi_fastmap_init_checkmap(struct ubi_volume *vol, int leb_count) { return 0; } +static inline void ubi_fastmap_destroy_checkmap(struct ubi_volume *vol) {} #endif /* block.c */ diff --git a/drivers/mtd/ubi/vmt.c b/drivers/mtd/ubi/vmt.c index 3fd8d7ff7a02..0be516780e92 100644 --- a/drivers/mtd/ubi/vmt.c +++ b/drivers/mtd/ubi/vmt.c @@ -139,6 +139,7 @@ static void vol_release(struct device *dev) struct ubi_volume *vol = container_of(dev, struct ubi_volume, dev); ubi_eba_replace_table(vol, NULL); + ubi_fastmap_destroy_checkmap(vol); kfree(vol); } diff --git a/drivers/mtd/ubi/vtbl.c b/drivers/mtd/ubi/vtbl.c index 263743e7b741..94d7a865b135 100644 --- a/drivers/mtd/ubi/vtbl.c +++ b/drivers/mtd/ubi/vtbl.c @@ -534,7 +534,7 @@ static int init_volumes(struct ubi_device *ubi, const struct ubi_attach_info *ai, const struct ubi_vtbl_record *vtbl) { - int i, reserved_pebs = 0; + int i, err, reserved_pebs = 0; struct ubi_ainf_volume *av; struct ubi_volume *vol; @@ -620,6 +620,16 @@ static int init_volumes(struct ubi_device *ubi, (long long)(vol->used_ebs - 1) * vol->usable_leb_size; vol->used_bytes += av->last_data_size; vol->last_eb_bytes = av->last_data_size; + + /* + * We use ubi->peb_count and not vol->reserved_pebs because + * we want to keep the code simple. Otherwise we'd have to + * resize/check the bitmap upon volume resize too. + * Allocating a few bytes more does not hurt. + */ + err = ubi_fastmap_init_checkmap(vol, ubi->peb_count); + if (err) + return err; } /* And add the layout volume */ @@ -645,6 +655,9 @@ static int init_volumes(struct ubi_device *ubi, reserved_pebs += vol->reserved_pebs; ubi->vol_count += 1; vol->ubi = ubi; + err = ubi_fastmap_init_checkmap(vol, UBI_LAYOUT_VOLUME_EBS); + if (err) + return err; if (reserved_pebs > ubi->avail_pebs) { ubi_err(ubi, "not enough PEBs, required %d, available %d", @@ -849,6 +862,7 @@ int ubi_read_volume_table(struct ubi_device *ubi, struct ubi_attach_info *ai) out_free: vfree(ubi->vtbl); for (i = 0; i < ubi->vtbl_slots + UBI_INT_VOL_COUNT; i++) { + ubi_fastmap_destroy_checkmap(ubi->volumes[i]); kfree(ubi->volumes[i]); ubi->volumes[i] = NULL; } -- 2.19.1

7 years, 1 month

v4.14 fix for Hikey 960 unbalanced IRQ enablement

by Rafael David Tinoco

Sasha, could you consider including this cherry-picked patchset in v4.14. Kernel v4.14 might suffer from the following unbalanced enablement for the board Hikey 960: Nov 5 12:02:54 hikey kernel: [ 22.148194] Unbalanced enable for IRQ 44 Nov 5 12:02:54 hikey kernel: [ 22.152193] ------------[ cut here ]------------ Nov 5 12:02:54 hikey kernel: [ 22.156872] WARNING: CPU: 2 PID: 509 at /home/inaddy/work/sources/linux/stable/stable-linux-4.14.y/kernel/irq/manage.c:525 __enable_irq+0x78/0x80 Nov 5 12:02:54 hikey kernel: [ 22.249606] CPU: 2 PID: 509 Comm: kworker/2:2 Not tainted 4.14.79 #1 Nov 5 12:02:54 hikey kernel: [ 22.255975] Hardware name: HiKey Development Board (DT) Nov 5 12:02:54 hikey kernel: [ 22.261248] Workqueue: events_freezable thermal_zone_device_check Nov 5 12:02:54 hikey kernel: [ 22.267368] task: ffff8000616e0e00 task.stack: ffff00000b5f0000 Nov 5 12:02:54 hikey kernel: [ 22.273312] PC is at __enable_irq+0x78/0x80 Nov 5 12:02:54 hikey kernel: [ 22.277516] LR is at __enable_irq+0x78/0x80 Nov 5 12:02:54 hikey kernel: [ 22.281718] pc : [<ffff00000813e010>] lr : [<ffff00000813e010>] pstate: 000001c5 Nov 5 12:02:54 hikey kernel: [ 22.289129] sp : ffff00000b5f3c80 Nov 5 12:02:54 hikey kernel: [ 22.292457] x29: ffff00000b5f3c80 x28: 0000000000000000 Nov 5 12:02:54 hikey kernel: [ 22.297804] x27: ffff80005c139e38 x26: ffff000008a71870 Nov 5 12:02:54 hikey kernel: [ 22.303148] x25: 0000000000000000 x24: 0000000000000002 Nov 5 12:02:54 hikey kernel: [ 22.308492] x23: ffff00000b5f3d9c x22: ffff80005d565e88 Nov 5 12:02:54 hikey kernel: [ 22.313836] x21: 000000000000f980 x20: 000000000000002c Nov 5 12:02:54 hikey kernel: [ 22.319181] x19: ffff800061726000 x18: 0000000000000010 Nov 5 12:02:54 hikey kernel: [ 22.324524] x17: 0000000000000000 x16: 0000000000000000 Nov 5 12:02:54 hikey kernel: [ 22.329868] x15: ffffffffffffffff x14: ffff000009269c08 Nov 5 12:02:54 hikey kernel: [ 22.335213] x13: ffff00008940678f x12: ffff000009406797 Nov 5 12:02:54 hikey kernel: [ 22.340558] x11: ffff000009290000 x10: ffff00000b5f3980 Nov 5 12:02:54 hikey kernel: [ 22.345902] x9 : 00000000ffffffd0 x8 : ffff00000862c298 Nov 5 12:02:54 hikey kernel: [ 22.351246] x7 : 6c62616e65206465 x6 : 00000000000001b2 Nov 5 12:02:54 hikey kernel: [ 22.356589] x5 : 0000000000000000 x4 : 0000000000000000 Nov 5 12:02:54 hikey kernel: [ 22.361931] x3 : 0000000000000000 x2 : ffff800063e824c8 Nov 5 12:02:54 hikey kernel: [ 22.367275] x1 : 000080005af95000 x0 : 000000000000001c Nov 5 12:02:54 hikey kernel: [ 22.372618] Call trace: Nov 5 12:02:54 hikey kernel: [ 22.375088] Exception stack(0xffff00000b5f3b40 to 0xffff00000b5f3c80) Nov 5 12:02:54 hikey kernel: [ 22.381560] 3b40: 000000000000001c 000080005af95000 ffff800063e824c8 0000000000000000 Nov 5 12:02:54 hikey kernel: [ 22.389417] 3b60: 0000000000000000 0000000000000000 00000000000001b2 6c62616e65206465 Nov 5 12:02:54 hikey kernel: [ 22.397276] 3b80: ffff00000862c298 00000000ffffffd0 ffff00000b5f3980 ffff000009290000 Nov 5 12:02:54 hikey kernel: [ 22.405136] 3ba0: ffff000009406797 ffff00008940678f ffff000009269c08 ffffffffffffffff Nov 5 12:02:54 hikey kernel: [ 22.412994] 3bc0: 0000000000000000 0000000000000000 0000000000000010 ffff800061726000 Nov 5 12:02:54 hikey kernel: [ 22.420852] 3be0: 000000000000002c 000000000000f980 ffff80005d565e88 ffff00000b5f3d9c Nov 5 12:02:54 hikey kernel: [ 22.428710] 3c00: 0000000000000002 0000000000000000 ffff000008a71870 ffff80005c139e38 Nov 5 12:02:54 hikey kernel: [ 22.436569] 3c20: 0000000000000000 ffff00000b5f3c80 ffff00000813e010 ffff00000b5f3c80 Nov 5 12:02:54 hikey kernel: [ 22.444426] 3c40: ffff00000813e010 00000000000001c5 0000000000000000 0000000000000000 Nov 5 12:02:54 hikey kernel: [ 22.452286] 3c60: ffffffffffffffff ffff800061800618 ffff00000b5f3c80 ffff00000813e010 Nov 5 12:02:54 hikey kernel: [ 22.460144] [<ffff00000813e010>] __enable_irq+0x78/0x80 Nov 5 12:02:54 hikey kernel: [ 22.465394] [<ffff00000813e058>] enable_irq+0x40/0x78 Nov 5 12:02:54 hikey kernel: [ 22.470493] [<ffff000000e228a8>] hisi_thermal_get_temp+0x1b0/0x1d8 [hisi_thermal] Nov 5 12:02:54 hikey kernel: [ 22.478008] [<ffff0000087121a8>] of_thermal_get_temp+0x38/0x50 Nov 5 12:02:54 hikey kernel: [ 22.483869] [<ffff000008711790>] thermal_zone_get_temp+0x58/0x80 Nov 5 12:02:54 hikey kernel: [ 22.489903] [<ffff00000870e7bc>] thermal_zone_device_update.part.4+0x2c/0x1a8 Nov 5 12:02:54 hikey kernel: [ 22.497066] [<ffff00000870e9c8>] thermal_zone_device_check+0x40/0x50 Nov 5 12:02:54 hikey kernel: [ 22.503457] [<ffff0000080f1674>] process_one_work+0x19c/0x3d0 Nov 5 12:02:54 hikey kernel: [ 22.509236] [<ffff0000080f18f4>] worker_thread+0x4c/0x428 Nov 5 12:02:54 hikey kernel: [ 22.514664] [<ffff0000080f84fc>] kthread+0x134/0x138 Nov 5 12:02:54 hikey kernel: [ 22.519659] [<ffff000008085154>] ret_from_fork+0x10/0x1c Nov 5 12:02:54 hikey kernel: [ 22.524988] ---[ end trace 328d4bb2d9b066a0 ]--- This issue was solved when "hisi_thermal_alarm_irq" function was removed so only "hisi_thermal_alarm_irq_thread" would exist. This has fixed the issue for the unbalanced enablement since there is no more: disable_irq_nosync(irq); data->irq_enabled = false; logic being done in parallel to the threaded handler AND the thermal_zone_device_update() call only happens now if the temperature is already above the threshold.

7 years, 1 month

[PATCH 3.18-4.14] mm: cleancache: fix corruption on missed inode invalidation

by Vasily Averin

Backported mainline commit 6ff38bd40230af35 ("mm: cleancache: fix corruption on missed inode invalidation") From: Pavel Tikhomirov <ptikhomirov(a)virtuozzo.com> Date: Fri, 30 Nov 2018 14:09:00 -0800 Subject: [PATCH] mm: cleancache: fix corruption on missed inode invalidation If all pages are deleted from the mapping by memory reclaim and also moved to the cleancache: __delete_from_page_cache (no shadow case) unaccount_page_cache_page cleancache_put_page page_cache_delete mapping->nrpages -= nr (nrpages becomes 0) We don't clean the cleancache for an inode after final file truncation (removal). truncate_inode_pages_final check (nrpages || nrexceptional) is false no truncate_inode_pages no cleancache_invalidate_inode(mapping) These way when reading the new file created with same inode we may get these trash leftover pages from cleancache and see wrong data instead of the contents of the new file. Fix it by always doing truncate_inode_pages which is already ready for nrpages == 0 && nrexceptional == 0 case and just invalidates inode. [akpm(a)linux-foundation.org: add comment, per Jan] Link: http://lkml.kernel.org/r/20181112095734.17979-1-ptikhomirov@virtuozzo.com Fixes: commit 91b0abe36a7b ("mm + fs: store shadow entries in page cache") Signed-off-by: Pavel Tikhomirov <ptikhomirov(a)virtuozzo.com> Reviewed-by: Vasily Averin <vvs(a)virtuozzo.com> Reviewed-by: Andrey Ryabinin <aryabinin(a)virtuozzo.com> Reviewed-by: Jan Kara <jack(a)suse.cz> Cc: Johannes Weiner <hannes(a)cmpxchg.org> Cc: Mel Gorman <mgorman(a)techsingularity.net> Cc: Matthew Wilcox <willy(a)infradead.org> Cc: Andi Kleen <ak(a)linux.intel.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org> Signed-off-by: Vasily Averin <vvs(a)virtuozzo.com> --- mm/truncate.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/mm/truncate.c b/mm/truncate.c index 2330223841fb..d3a2737cc188 100644 --- a/mm/truncate.c +++ b/mm/truncate.c @@ -471,9 +471,13 @@ void truncate_inode_pages_final(struct address_space *mapping) */ spin_lock_irq(&mapping->tree_lock); spin_unlock_irq(&mapping->tree_lock); - - truncate_inode_pages(mapping, 0); } + + /* + * Cleancache needs notification even if there are no pages or shadow + * entries. + */ + truncate_inode_pages(mapping, 0); } EXPORT_SYMBOL(truncate_inode_pages_final); -- 2.17.1

7 years, 1 month

request for 4.14-stable: b54e41f5efcb ("udf: Allow mounting volumes with incorrect identification strings")

by Sudip Mukherjee

Hi Greg, This has been applied to 4.19-stable, but should be needed in 4.14-stable also. I have attached the backported patch and have also tested with one such DVD where the problem was seen. -- Regards Sudip

7 years, 1 month

← Newer
1
...
61
62
63
64
65
66
67
...
85
Older →

Jump to page:

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-stable-mirror December 2018