May 2020 - Linux-stable-mirror

[PATCH 1/4] f2fs: don't leak filename in f2fs_try_convert_inline_dir()

by Eric Biggers

From: Eric Biggers <ebiggers(a)google.com> We need to call fscrypt_free_filename() to free the memory allocated by fscrypt_setup_filename(). Fixes: b06af2aff28b ("f2fs: convert inline_dir early before starting rename") Cc: <stable(a)vger.kernel.org> # v5.6+ Signed-off-by: Eric Biggers <ebiggers(a)google.com> --- fs/f2fs/inline.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/fs/f2fs/inline.c b/fs/f2fs/inline.c index 4167e540815185..59a4b7ff11e17a 100644 --- a/fs/f2fs/inline.c +++ b/fs/f2fs/inline.c @@ -559,12 +559,12 @@ int f2fs_try_convert_inline_dir(struct inode *dir, struct dentry *dentry) ipage = f2fs_get_node_page(sbi, dir->i_ino); if (IS_ERR(ipage)) { err = PTR_ERR(ipage); - goto out; + goto out_fname; } if (f2fs_has_enough_room(dir, ipage, &fname)) { f2fs_put_page(ipage, 1); - goto out; + goto out_fname; } inline_dentry = inline_data_addr(dir, ipage); @@ -572,6 +572,8 @@ int f2fs_try_convert_inline_dir(struct inode *dir, struct dentry *dentry) err = do_convert_inline_dir(dir, ipage, inline_dentry); if (!err) f2fs_put_page(ipage, 1); +out_fname: + fscrypt_free_filename(&fname); out: f2fs_unlock_op(sbi); return err; -- 2.26.2

5 years, 6 months

2
1
0 0

[PATCH V1 1/2] mmc: core: Check request type before completing the request

by Veerabhadrarao Badiganti

In the request completion path with CQE, request type is being checked after the request is getting completed. This is resulting in returning the wrong request type and leading to the IO hang issue. ASYNC request type is getting returned for DCMD type requests. Because of this mismatch, mq->cqe_busy flag is never getting cleared and the driver is not invoking blk_mq_hw_run_queue. So requests are not getting dispatched to the LLD from the block layer. All these eventually leading to IO hang issues. So, get the request type before completing the request. Cc: <stable(a)vger.kernel.org> # v4.19+ Signed-off-by: Veerabhadrarao Badiganti <vbadigan(a)codeaurora.org> --- drivers/mmc/core/block.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c index 8499b56..c5367e2 100644 --- a/drivers/mmc/core/block.c +++ b/drivers/mmc/core/block.c @@ -1370,6 +1370,7 @@ static void mmc_blk_cqe_complete_rq(struct mmc_queue *mq, struct request *req) struct mmc_request *mrq = &mqrq->brq.mrq; struct request_queue *q = req->q; struct mmc_host *host = mq->card->host; + enum mmc_issue_type issue_type = mmc_issue_type(mq, req); unsigned long flags; bool put_card; int err; @@ -1399,7 +1400,7 @@ static void mmc_blk_cqe_complete_rq(struct mmc_queue *mq, struct request *req) spin_lock_irqsave(&mq->lock, flags); - mq->in_flight[mmc_issue_type(mq, req)] -= 1; + mq->in_flight[issue_type] -= 1; put_card = (mmc_tot_in_flight(mq) == 0); -- Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc., is a member of Code Aurora Forum, a Linux Foundation Collaborative Project

5 years, 6 months

3
2
0 0

[PATCH v3 04/11] PCI: qcom: add missing reset for ipq806x

by Ansuel Smith

Add missing ext reset used by ipq8064 SoC in PCIe qcom driver. Fixes: 82a823833f4e PCI: qcom: Add Qualcomm PCIe controller driver Signed-off-by: Sham Muthayyan <smuthayy(a)codeaurora.org> Signed-off-by: Ansuel Smith <ansuelsmth(a)gmail.com> Cc: stable(a)vger.kernel.org # v4.5+ --- drivers/pci/controller/dwc/pcie-qcom.c | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/drivers/pci/controller/dwc/pcie-qcom.c b/drivers/pci/controller/dwc/pcie-qcom.c index 7a8901efc031..921030a64bab 100644 --- a/drivers/pci/controller/dwc/pcie-qcom.c +++ b/drivers/pci/controller/dwc/pcie-qcom.c @@ -95,6 +95,7 @@ struct qcom_pcie_resources_2_1_0 { struct reset_control *ahb_reset; struct reset_control *por_reset; struct reset_control *phy_reset; + struct reset_control *ext_reset; struct regulator_bulk_data supplies[QCOM_PCIE_2_1_0_MAX_SUPPLY]; }; @@ -272,6 +273,10 @@ static int qcom_pcie_get_resources_2_1_0(struct qcom_pcie *pcie) if (IS_ERR(res->por_reset)) return PTR_ERR(res->por_reset); + res->ext_reset = devm_reset_control_get_optional_exclusive(dev, "ext"); + if (IS_ERR(res->ext_reset)) + return PTR_ERR(res->ext_reset); + res->phy_reset = devm_reset_control_get_exclusive(dev, "phy"); return PTR_ERR_OR_ZERO(res->phy_reset); } @@ -285,6 +290,7 @@ static void qcom_pcie_deinit_2_1_0(struct qcom_pcie *pcie) reset_control_assert(res->axi_reset); reset_control_assert(res->ahb_reset); reset_control_assert(res->por_reset); + reset_control_assert(res->ext_reset); reset_control_assert(res->phy_reset); clk_disable_unprepare(res->iface_clk); clk_disable_unprepare(res->core_clk); @@ -347,6 +353,12 @@ static int qcom_pcie_init_2_1_0(struct qcom_pcie *pcie) goto err_deassert_ahb; } + ret = reset_control_deassert(res->ext_reset); + if (ret) { + dev_err(dev, "cannot assert ext reset\n"); + goto err_deassert_ahb; + } + /* enable PCIe clocks and resets */ val = readl(pcie->parf + PCIE20_PARF_PHY_CTRL); val &= ~BIT(0); -- 2.25.1

5 years, 6 months

3
2
0 0

Re: Patch "lib: devres: add a helper function for ioremap_uc" has been added to the 4.19-stable tree

by Tuowen Zhao

Hi, I believe some patches are needed to fix build issues on Hexagon: ac32292c8552f7e8517be184e65dd09786e991f9 hexagon: clean up ioremap 7312b70699252074d753c5005fc67266c547bbe3 hexagon: define ioremap_uc The same is for stable v5.4. Best, Tuowen

5 years, 6 months

2
1
0 0

[PATCH v2] dma-buf: fix use-after-free in dmabuffs_dname

by Charan Teja Reddy

The following race occurs while accessing the dmabuf object exported as file: P1 P2 dma_buf_release() dmabuffs_dname() [say lsof reading /proc/<P1 pid>/fd/<num>] read dmabuf stored in dentry->d_fsdata Free the dmabuf object Start accessing the dmabuf structure In the above description, the dmabuf object freed in P1 is being accessed from P2 which is resulting into the use-after-free. Below is the dump stack reported. We are reading the dmabuf object stored in the dentry->d_fsdata but there is no binding between the dentry and the dmabuf which means that the dmabuf can be freed while it is being read from ->d_fsdata and inuse. Reviews on the patch V1 says that protecting the dmabuf inuse with an extra refcount is not a viable solution as the exported dmabuf is already under file's refcount and keeping the multiple refcounts on the same object coordinated is not possible. As we are reading the dmabuf in ->d_fsdata just to get the user passed name, we can directly store the name in d_fsdata thus can avoid the reading of dmabuf altogether. Call Trace: kasan_report+0x12/0x20 __asan_report_load8_noabort+0x14/0x20 dmabuffs_dname+0x4f4/0x560 tomoyo_realpath_from_path+0x165/0x660 tomoyo_get_realpath tomoyo_check_open_permission+0x2a3/0x3e0 tomoyo_file_open tomoyo_file_open+0xa9/0xd0 security_file_open+0x71/0x300 do_dentry_open+0x37a/0x1380 vfs_open+0xa0/0xd0 path_openat+0x12ee/0x3490 do_filp_open+0x192/0x260 do_sys_openat2+0x5eb/0x7e0 do_sys_open+0xf2/0x180 Fixes: bb2bb9030425 ("dma-buf: add DMA_BUF_SET_NAME ioctls") Reported-by: syzbot+3643a18836bce555bff6(a)syzkaller.appspotmail.com Cc: <stable(a)vger.kernel.org> [5.3+] Signed-off-by: Charan Teja Reddy <charante(a)codeaurora.org> --- Changes in v2: - Pass the user passed name in ->d_fsdata instead of dmabuf - Improve the commit message Changes in v1: (https://patchwork.kernel.org/patch/11514063/) drivers/dma-buf/dma-buf.c | 17 ++++++++++------- 1 file changed, 10 insertions(+), 7 deletions(-) diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c index 01ce125..0071f7d 100644 --- a/drivers/dma-buf/dma-buf.c +++ b/drivers/dma-buf/dma-buf.c @@ -25,6 +25,7 @@ #include <linux/mm.h> #include <linux/mount.h> #include <linux/pseudo_fs.h> +#include <linux/dcache.h> #include <uapi/linux/dma-buf.h> #include <uapi/linux/magic.h> @@ -40,15 +41,13 @@ struct dma_buf_list { static char *dmabuffs_dname(struct dentry *dentry, char *buffer, int buflen) { - struct dma_buf *dmabuf; char name[DMA_BUF_NAME_LEN]; size_t ret = 0; - dmabuf = dentry->d_fsdata; - dma_resv_lock(dmabuf->resv, NULL); - if (dmabuf->name) - ret = strlcpy(name, dmabuf->name, DMA_BUF_NAME_LEN); - dma_resv_unlock(dmabuf->resv); + spin_lock(&dentry->d_lock); + if (dentry->d_fsdata) + ret = strlcpy(name, dentry->d_fsdata, DMA_BUF_NAME_LEN); + spin_unlock(&dentry->d_lock); return dynamic_dname(dentry, buffer, buflen, "/%s:%s", dentry->d_name.name, ret > 0 ? name : ""); @@ -80,12 +79,16 @@ static int dma_buf_fs_init_context(struct fs_context *fc) static int dma_buf_release(struct inode *inode, struct file *file) { struct dma_buf *dmabuf; + struct dentry *dentry = file->f_path.dentry; if (!is_dma_buf_file(file)) return -EINVAL; dmabuf = file->private_data; + spin_lock(&dentry->d_lock); + dentry->d_fsdata = NULL; + spin_unlock(&dentry->d_lock); BUG_ON(dmabuf->vmapping_counter); /* @@ -343,6 +346,7 @@ static long dma_buf_set_name(struct dma_buf *dmabuf, const char __user *buf) } kfree(dmabuf->name); dmabuf->name = name; + dmabuf->file->f_path.dentry->d_fsdata = name; out_unlock: dma_resv_unlock(dmabuf->resv); @@ -446,7 +450,6 @@ static struct file *dma_buf_getfile(struct dma_buf *dmabuf, int flags) goto err_alloc_file; file->f_flags = flags & (O_ACCMODE | O_NONBLOCK); file->private_data = dmabuf; - file->f_path.dentry->d_fsdata = dmabuf; return file; -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, hosted by The Linux Foundation

5 years, 6 months

1
0
0 0

[PATCH] fs/io_uring: fix O_PATH fds in openat, openat2, statx

by Max Kellermann

If an operation's flag `needs_file` is set, the function io_req_set_file() calls io_file_get() to obtain a `struct file*`. This fails for `O_PATH` file descriptors, because those have no `struct file*`, causing io_req_set_file() to throw `-EBADF`. This breaks the operations `openat`, `openat2` and `statx`, where `O_PATH` file descriptors are commonly used. The solution is to simply remove `needs_file` (and the accompanying flag `fd_non_reg`). This flag was never needed because those operations use numeric file descriptor and don't use the `struct file*` obtained by io_req_set_file(). Signed-off-by: Max Kellermann <mk(a)cm4all.com> Cc: stable(a)vger.kernel.org --- fs/io_uring.c | 6 ------ 1 file changed, 6 deletions(-) diff --git a/fs/io_uring.c b/fs/io_uring.c index a46de2cfc28e..d24f8e33323c 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -693,8 +693,6 @@ static const struct io_op_def io_op_defs[] = { .needs_file = 1, }, [IORING_OP_OPENAT] = { - .needs_file = 1, - .fd_non_neg = 1, .file_table = 1, .needs_fs = 1, }, @@ -708,8 +706,6 @@ static const struct io_op_def io_op_defs[] = { }, [IORING_OP_STATX] = { .needs_mm = 1, - .needs_file = 1, - .fd_non_neg = 1, .needs_fs = 1, }, [IORING_OP_READ] = { @@ -739,8 +735,6 @@ static const struct io_op_def io_op_defs[] = { .unbound_nonreg_file = 1, }, [IORING_OP_OPENAT2] = { - .needs_file = 1, - .fd_non_neg = 1, .file_table = 1, .needs_fs = 1, }, -- 2.20.1

5 years, 6 months

3
16
0 0

[patch 15/15] mm: limit boost_watermark on small zones

by Andrew Morton

From: Henry Willard <henry.willard(a)oracle.com> Subject: mm: limit boost_watermark on small zones Commit 1c30844d2dfe ("mm: reclaim small amounts of memory when an external fragmentation event occurs") adds a boost_watermark() function which increases the min watermark in a zone by at least pageblock_nr_pages or the number of pages in a page block. On Arm64, with 64K pages and 512M huge pages, this is 8192 pages or 512M. It does this regardless of the number of managed pages managed in the zone or the likelihood of success. This can put the zone immediately under water in terms of allocating pages from the zone, and can cause a small machine to fail immediately due to OoM. Unlike set_recommended_min_free_kbytes(), which substantially increases min_free_kbytes and is tied to THP, boost_watermark() can be called even if THP is not active. The problem is most likely to appear on architectures such as Arm64 where pageblock_nr_pages is very large. It is desirable to run the kdump capture kernel in as small a space as possible to avoid wasting memory. In some architectures, such as Arm64, there are restrictions on where the capture kernel can run, and therefore, the space available. A capture kernel running in 768M can fail due to OoM immediately after boost_watermark() sets the min in zone DMA32, where most of the memory is, to 512M. It fails even though there is over 500M of free memory. With boost_watermark() suppressed, the capture kernel can run successfully in 448M. This patch limits boost_watermark() to boosting a zone's min watermark only when there are enough pages that the boost will produce positive results. In this case that is estimated to be four times as many pages as pageblock_nr_pages. Mel said: : There is no harm in marking it stable. Clearly it does not happen very : often but it's not impossible. 32-bit x86 is a lot less common now : which would previously have been vulnerable to triggering this easily. : ppc64 has a larger base page size but typically only has one zone. : arm64 is likely the most vulnerable, particularly when CMA is : configured with a small movable zone. Link: http://lkml.kernel.org/r/1588294148-6586-1-git-send-email-henry.willard@ora… Fixes: 1c30844d2dfe ("mm: reclaim small amounts of memory when an external fragmentation event occurs") Signed-off-by: Henry Willard <henry.willard(a)oracle.com> Acked-by: Mel Gorman <mgorman(a)techsingularity.net> Reviewed-by: David Hildenbrand <david(a)redhat.com> Cc: Vlastimil Babka <vbabka(a)suse.cz> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/page_alloc.c | 8 ++++++++ 1 file changed, 8 insertions(+) --- a/mm/page_alloc.c~mm-limit-boost_watermark-on-small-zones +++ a/mm/page_alloc.c @@ -2401,6 +2401,14 @@ static inline void boost_watermark(struc if (!watermark_boost_factor) return; + /* + * Don't bother in zones that are unlikely to produce results. + * On small machines, including kdump capture kernels running + * in a small area, boosting the watermark can cause an out of + * memory situation immediately. + */ + if ((pageblock_nr_pages * 4) > zone_managed_pages(zone)) + return; max_boost = mult_frac(zone->_watermark[WMARK_HIGH], watermark_boost_factor, 10000); _

5 years, 6 months

1
0
0 0

[patch 12/15] epoll: atomically remove wait entry on wake up

by Andrew Morton

From: Roman Penyaev <rpenyaev(a)suse.de> Subject: epoll: atomically remove wait entry on wake up This patch does two things: 1. fixes lost wakeup introduced by: 339ddb53d373 ("fs/epoll: remove unnecessary wakeups of nested epoll") 2. improves performance for events delivery. The description of the problem is the following: if N (>1) threads are waiting on ep->wq for new events and M (>1) events come, it is quite likely that >1 wakeups hit the same wait queue entry, because there is quite a big window between __add_wait_queue_exclusive() and the following __remove_wait_queue() calls in ep_poll() function. This can lead to lost wakeups, because thread, which was woken up, can handle not all the events in ->rdllist. (in better words the problem is described here: https://lkml.org/lkml/2019/10/7/905) The idea of the current patch is to use init_wait() instead of init_waitqueue_entry(). Internally init_wait() sets autoremove_wake_function as a callback, which removes the wait entry atomically (under the wq locks) from the list, thus the next coming wakeup hits the next wait entry in the wait queue, thus preventing lost wakeups. Problem is very well reproduced by the epoll60 test case [1]. Wait entry removal on wakeup has also performance benefits, because there is no need to take a ep->lock and remove wait entry from the queue after the successful wakeup. Here is the timing output of the epoll60 test case: With explicit wakeup from ep_scan_ready_list() (the state of the code prior 339ddb53d373): real 0m6.970s user 0m49.786s sys 0m0.113s After this patch: real 0m5.220s user 0m36.879s sys 0m0.019s The other testcase is the stress-epoll [2], where one thread consumes all the events and other threads produce many events: With explicit wakeup from ep_scan_ready_list() (the state of the code prior 339ddb53d373): threads events/ms run-time ms 8 5427 1474 16 6163 2596 32 6824 4689 64 7060 9064 128 6991 18309 After this patch: threads events/ms run-time ms 8 5598 1429 16 7073 2262 32 7502 4265 64 7640 8376 128 7634 16767 (number of "events/ms" represents event bandwidth, thus higher is better; number of "run-time ms" represents overall time spent doing the benchmark, thus lower is better) [1] tools/testing/selftests/filesystems/epoll/epoll_wakeup_test.c [2] https://github.com/rouming/test-tools/blob/master/stress-epoll.c Link: http://lkml.kernel.org/r/20200430130326.1368509-2-rpenyaev@suse.de Signed-off-by: Roman Penyaev <rpenyaev(a)suse.de> Reviewed-by: Jason Baron <jbaron(a)akamai.com> Cc: Khazhismel Kumykov <khazhy(a)google.com> Cc: Alexander Viro <viro(a)zeniv.linux.org.uk> Cc: Heiher <r(a)hev.cc> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- fs/eventpoll.c | 43 ++++++++++++++++++++++++------------------- 1 file changed, 24 insertions(+), 19 deletions(-) --- a/fs/eventpoll.c~epoll-atomically-remove-wait-entry-on-wake-up +++ a/fs/eventpoll.c @@ -1822,7 +1822,6 @@ static int ep_poll(struct eventpoll *ep, { int res = 0, eavail, timed_out = 0; u64 slack = 0; - bool waiter = false; wait_queue_entry_t wait; ktime_t expires, *to = NULL; @@ -1867,21 +1866,23 @@ fetch_events: */ ep_reset_busy_poll_napi_id(ep); - /* - * We don't have any available event to return to the caller. We need - * to sleep here, and we will be woken by ep_poll_callback() when events - * become available. - */ - if (!waiter) { - waiter = true; - init_waitqueue_entry(&wait, current); - + do { + /* + * Internally init_wait() uses autoremove_wake_function(), + * thus wait entry is removed from the wait queue on each + * wakeup. Why it is important? In case of several waiters + * each new wakeup will hit the next waiter, giving it the + * chance to harvest new event. Otherwise wakeup can be + * lost. This is also good performance-wise, because on + * normal wakeup path no need to call __remove_wait_queue() + * explicitly, thus ep->lock is not taken, which halts the + * event delivery. + */ + init_wait(&wait); write_lock_irq(&ep->lock); __add_wait_queue_exclusive(&ep->wq, &wait); write_unlock_irq(&ep->lock); - } - for (;;) { /* * We don't want to sleep if the ep_poll_callback() sends us * a wakeup in between. That's why we set the task state @@ -1911,10 +1912,20 @@ fetch_events: timed_out = 1; break; } - } + + /* We were woken up, thus go and try to harvest some events */ + eavail = 1; + + } while (0); __set_current_state(TASK_RUNNING); + if (!list_empty_careful(&wait.entry)) { + write_lock_irq(&ep->lock); + __remove_wait_queue(&ep->wq, &wait); + write_unlock_irq(&ep->lock); + } + send_events: /* * Try to transfer events to user space. In case we get 0 events and @@ -1925,12 +1936,6 @@ send_events: !(res = ep_send_events(ep, events, maxevents)) && !timed_out) goto fetch_events; - if (waiter) { - write_lock_irq(&ep->lock); - __remove_wait_queue(&ep->wq, &wait); - write_unlock_irq(&ep->lock); - } - return res; } _

5 years, 6 months

1
0
0 0

[patch 07/15] eventpoll: fix missing wakeup for ovflist in ep_poll_callback

by Andrew Morton

From: Khazhismel Kumykov <khazhy(a)google.com> Subject: eventpoll: fix missing wakeup for ovflist in ep_poll_callback In the event that we add to ovflist, before 339ddb53d373 we would be woken up by ep_scan_ready_list, and did no wakeup in ep_poll_callback. With that wakeup removed, if we add to ovflist here, we may never wake up. Rather than adding back the ep_scan_ready_list wakeup - which was resulting in unnecessary wakeups, trigger a wake-up in ep_poll_callback. We noticed that one of our workloads was missing wakeups starting with 339ddb53d373 and upon manual inspection, this wakeup seemed missing to me. With this patch added, we no longer see missing wakeups. I haven't yet tried to make a small reproducer, but the existing kselftests in filesystem/epoll passed for me with this patch. [khazhy(a)google.com: use if/elif instead of goto + cleanup suggested by Roman] Link: http://lkml.kernel.org/r/20200424190039.192373-1-khazhy@google.com Link: http://lkml.kernel.org/r/20200424025057.118641-1-khazhy@google.com Fixes: 339ddb53d373 ("fs/epoll: remove unnecessary wakeups of nested epoll") Signed-off-by: Khazhismel Kumykov <khazhy(a)google.com> Reviewed-by: Roman Penyaev <rpenyaev(a)suse.de> Cc: Alexander Viro <viro(a)zeniv.linux.org.uk> Cc: Roman Penyaev <rpenyaev(a)suse.de> Cc: Heiher <r(a)hev.cc> Cc: Jason Baron <jbaron(a)akamai.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- fs/eventpoll.c | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) --- a/fs/eventpoll.c~eventpoll-fix-missing-wakeup-for-ovflist-in-ep_poll_callback +++ a/fs/eventpoll.c @@ -1171,6 +1171,10 @@ static inline bool chain_epi_lockless(st { struct eventpoll *ep = epi->ep; + /* Fast preliminary check */ + if (epi->next != EP_UNACTIVE_PTR) + return false; + /* Check that the same epi has not been just chained from another CPU */ if (cmpxchg(&epi->next, EP_UNACTIVE_PTR, NULL) != EP_UNACTIVE_PTR) return false; @@ -1237,16 +1241,12 @@ static int ep_poll_callback(wait_queue_e * chained in ep->ovflist and requeued later on. */ if (READ_ONCE(ep->ovflist) != EP_UNACTIVE_PTR) { - if (epi->next == EP_UNACTIVE_PTR && - chain_epi_lockless(epi)) + if (chain_epi_lockless(epi)) + ep_pm_stay_awake_rcu(epi); + } else if (!ep_is_linked(epi)) { + /* In the usual case, add event to ready list. */ + if (list_add_tail_lockless(&epi->rdllink, &ep->rdllist)) ep_pm_stay_awake_rcu(epi); - goto out_unlock; - } - - /* If this file is already in the ready list we exit soon */ - if (!ep_is_linked(epi) && - list_add_tail_lockless(&epi->rdllink, &ep->rdllist)) { - ep_pm_stay_awake_rcu(epi); } /* _

5 years, 6 months

1
0
0 0

[patch 03/15] mm/page_alloc: fix watchdog soft lockups during set_zone_contiguous()

by Andrew Morton

From: David Hildenbrand <david(a)redhat.com> Subject: mm/page_alloc: fix watchdog soft lockups during set_zone_contiguous() Without CONFIG_PREEMPT, it can happen that we get soft lockups detected, e.g., while booting up. [ 105.608900] watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swapper/0:1] [ 105.608933] Modules linked in: [ 105.608933] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.6.0-next-20200331+ #4 [ 105.608933] Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014 [ 105.608933] RIP: 0010:__pageblock_pfn_to_page+0x134/0x1c0 [ 105.608933] Code: 85 c0 74 71 4a 8b 04 d0 48 85 c0 74 68 48 01 c1 74 63 f6 01 04 74 5e 48 c1 e7 06 4c 8b 05 cc 991 [ 105.608933] RSP: 0000:ffffb6d94000fe60 EFLAGS: 00010286 ORIG_RAX: ffffffffffffff13 [ 105.608933] RAX: fffff81953250000 RBX: 000000000a4c9600 RCX: ffff8fe9ff7c1990 [ 105.608933] RDX: ffff8fe9ff7dab80 RSI: 000000000a4c95ff RDI: 0000000293250000 [ 105.608933] RBP: ffff8fe9ff7dab80 R08: fffff816c0000000 R09: 0000000000000008 [ 105.608933] R10: 0000000000000014 R11: 0000000000000014 R12: 0000000000000000 [ 105.608933] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [ 105.608933] FS: 0000000000000000(0000) GS:ffff8fe1ff400000(0000) knlGS:0000000000000000 [ 105.608933] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 105.608933] CR2: 000000000f613000 CR3: 00000088cf20a000 CR4: 00000000000006f0 [ 105.608933] Call Trace: [ 105.608933] set_zone_contiguous+0x56/0x70 [ 105.608933] page_alloc_init_late+0x166/0x176 [ 105.608933] kernel_init_freeable+0xfa/0x255 [ 105.608933] ? rest_init+0xaa/0xaa [ 105.608933] kernel_init+0xa/0x106 [ 105.608933] ret_from_fork+0x35/0x40 The issue becomes visible when having a lot of memory (e.g., 4TB) assigned to a single NUMA node - a system that can easily be created using QEMU. Inside VMs on a hypervisor with quite some memory overcommit, this is fairly easy to trigger. Link: http://lkml.kernel.org/r/20200416073417.5003-1-david@redhat.com Signed-off-by: David Hildenbrand <david(a)redhat.com> Reviewed-by: Pavel Tatashin <pasha.tatashin(a)soleen.com> Reviewed-by: Pankaj Gupta <pankaj.gupta.linux(a)gmail.com> Reviewed-by: Baoquan He <bhe(a)redhat.com> Reviewed-by: Shile Zhang <shile.zhang(a)linux.alibaba.com> Acked-by: Michal Hocko <mhocko(a)suse.com> Cc: Kirill Tkhai <ktkhai(a)virtuozzo.com> Cc: Shile Zhang <shile.zhang(a)linux.alibaba.com> Cc: Pavel Tatashin <pasha.tatashin(a)soleen.com> Cc: Daniel Jordan <daniel.m.jordan(a)oracle.com> Cc: Michal Hocko <mhocko(a)kernel.org> Cc: Alexander Duyck <alexander.duyck(a)gmail.com> Cc: Baoquan He <bhe(a)redhat.com> Cc: Oscar Salvador <osalvador(a)suse.de> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/page_alloc.c | 1 + 1 file changed, 1 insertion(+) --- a/mm/page_alloc.c~mm-page_alloc-fix-watchdog-soft-lockups-during-set_zone_contiguous +++ a/mm/page_alloc.c @@ -1607,6 +1607,7 @@ void set_zone_contiguous(struct zone *zo if (!__pageblock_pfn_to_page(block_start_pfn, block_end_pfn, zone)) return; + cond_resched(); } /* We confirm that there is no hole */ _

5 years, 6 months

1
0
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-stable-mirror May 2020