The patch titled
Subject: mm/page_alloc: fix watchdog soft lockups during set_zone_contiguous()
has been added to the -mm tree. Its filename is
mm-page_alloc-fix-watchdog-soft-lockups-during-set_zone_contiguous.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/mm-page_alloc-fix-watchdog-soft-lo…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/mm-page_alloc-fix-watchdog-soft-lo…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: David Hildenbrand <david(a)redhat.com>
Subject: mm/page_alloc: fix watchdog soft lockups during set_zone_contiguous()
Without CONFIG_PREEMPT, it can happen that we get soft lockups detected,
e.g., while booting up.
[ 105.608900] watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swapper/0:1]
[ 105.608933] Modules linked in:
[ 105.608933] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.6.0-next-20200331+ #4
[ 105.608933] Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
[ 105.608933] RIP: 0010:__pageblock_pfn_to_page+0x134/0x1c0
[ 105.608933] Code: 85 c0 74 71 4a 8b 04 d0 48 85 c0 74 68 48 01 c1 74 63 f6 01 04 74 5e 48 c1 e7 06 4c 8b 05 cc 991
[ 105.608933] RSP: 0000:ffffb6d94000fe60 EFLAGS: 00010286 ORIG_RAX: ffffffffffffff13
[ 105.608933] RAX: fffff81953250000 RBX: 000000000a4c9600 RCX: ffff8fe9ff7c1990
[ 105.608933] RDX: ffff8fe9ff7dab80 RSI: 000000000a4c95ff RDI: 0000000293250000
[ 105.608933] RBP: ffff8fe9ff7dab80 R08: fffff816c0000000 R09: 0000000000000008
[ 105.608933] R10: 0000000000000014 R11: 0000000000000014 R12: 0000000000000000
[ 105.608933] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 105.608933] FS: 0000000000000000(0000) GS:ffff8fe1ff400000(0000) knlGS:0000000000000000
[ 105.608933] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 105.608933] CR2: 000000000f613000 CR3: 00000088cf20a000 CR4: 00000000000006f0
[ 105.608933] Call Trace:
[ 105.608933] set_zone_contiguous+0x56/0x70
[ 105.608933] page_alloc_init_late+0x166/0x176
[ 105.608933] kernel_init_freeable+0xfa/0x255
[ 105.608933] ? rest_init+0xaa/0xaa
[ 105.608933] kernel_init+0xa/0x106
[ 105.608933] ret_from_fork+0x35/0x40
The issue becomes visible when having a lot of memory (e.g., 4TB) assigned
to a single NUMA node - a system that can easily be created using QEMU.
Inside VMs on a hypervisor with quite some memory overcommit, this is
fairly easy to trigger.
Link: http://lkml.kernel.org/r/20200416073417.5003-1-david@redhat.com
Signed-off-by: David Hildenbrand <david(a)redhat.com>
Reviewed-by: Pavel Tatashin <pasha.tatashin(a)soleen.com>
Reviewed-by: Pankaj Gupta <pankaj.gupta.linux(a)gmail.com>
Reviewed-by: Baoquan He <bhe(a)redhat.com>
Reviewed-by: Shile Zhang <shile.zhang(a)linux.alibaba.com>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Cc: Kirill Tkhai <ktkhai(a)virtuozzo.com>
Cc: Shile Zhang <shile.zhang(a)linux.alibaba.com>
Cc: Pavel Tatashin <pasha.tatashin(a)soleen.com>
Cc: Daniel Jordan <daniel.m.jordan(a)oracle.com>
Cc: Michal Hocko <mhocko(a)kernel.org>
Cc: Alexander Duyck <alexander.duyck(a)gmail.com>
Cc: Baoquan He <bhe(a)redhat.com>
Cc: Oscar Salvador <osalvador(a)suse.de>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/page_alloc.c | 1 +
1 file changed, 1 insertion(+)
--- a/mm/page_alloc.c~mm-page_alloc-fix-watchdog-soft-lockups-during-set_zone_contiguous
+++ a/mm/page_alloc.c
@@ -1607,6 +1607,7 @@ void set_zone_contiguous(struct zone *zo
if (!__pageblock_pfn_to_page(block_start_pfn,
block_end_pfn, zone))
return;
+ cond_resched();
}
/* We confirm that there is no hole */
_
Patches currently in -mm which might be from david(a)redhat.com are
mm-page_alloc-fix-watchdog-soft-lockups-during-set_zone_contiguous.patch
drivers-base-memoryc-cache-memory-blocks-in-xarray-to-accelerate-lookup-fix.patch
powerpc-pseries-hotplug-memory-stop-checking-is_mem_section_removable.patch
mm-memory_hotplug-remove-is_mem_section_removable.patch
On Fri, 13 Mar 2020 at 06:41, Sowjanya Komatineni
<skomatineni(a)nvidia.com> wrote:
>
> Tegra host supports HW busy detection and timeouts based on the
> count programmed in SDHCI_TIMEOUT_CONTROL register and max busy
> timeout it supports is 11s in finite busy wait mode.
>
> Some operations like SLEEP_AWAKE, ERASE and flush cache through
> SWITCH commands take longer than 11s and Tegra host supports
> infinite HW busy wait mode where HW waits forever till the card
> is busy without HW timeout.
>
> This patch implements Tegra specific set_timeout sdhci_ops to allow
> switching between finite and infinite HW busy detection wait modes
> based on the device command expected operation time.
>
> Signed-off-by: Sowjanya Komatineni <skomatineni(a)nvidia.com>
> ---
> drivers/mmc/host/sdhci-tegra.c | 31 +++++++++++++++++++++++++++++++
> 1 file changed, 31 insertions(+)
>
> diff --git a/drivers/mmc/host/sdhci-tegra.c b/drivers/mmc/host/sdhci-tegra.c
> index a25c3a4..fa8f6a4 100644
> --- a/drivers/mmc/host/sdhci-tegra.c
> +++ b/drivers/mmc/host/sdhci-tegra.c
> @@ -45,6 +45,7 @@
> #define SDHCI_TEGRA_CAP_OVERRIDES_DQS_TRIM_SHIFT 8
>
> #define SDHCI_TEGRA_VENDOR_MISC_CTRL 0x120
> +#define SDHCI_MISC_CTRL_ERASE_TIMEOUT_LIMIT BIT(0)
> #define SDHCI_MISC_CTRL_ENABLE_SDR104 0x8
> #define SDHCI_MISC_CTRL_ENABLE_SDR50 0x10
> #define SDHCI_MISC_CTRL_ENABLE_SDHCI_SPEC_300 0x20
> @@ -1227,6 +1228,34 @@ static u32 sdhci_tegra_cqhci_irq(struct sdhci_host *host, u32 intmask)
> return 0;
> }
>
> +static void tegra_sdhci_set_timeout(struct sdhci_host *host,
> + struct mmc_command *cmd)
> +{
> + u32 val;
> +
> + /*
> + * HW busy detection timeout is based on programmed data timeout
> + * counter and maximum supported timeout is 11s which may not be
> + * enough for long operations like cache flush, sleep awake, erase.
> + *
> + * ERASE_TIMEOUT_LIMIT bit of VENDOR_MISC_CTRL register allows
> + * host controller to wait for busy state until the card is busy
> + * without HW timeout.
> + *
> + * So, use infinite busy wait mode for operations that may take
> + * more than maximum HW busy timeout of 11s otherwise use finite
> + * busy wait mode.
> + */
> + val = sdhci_readl(host, SDHCI_TEGRA_VENDOR_MISC_CTRL);
> + if (cmd && cmd->busy_timeout >= 11 * HZ)
> + val |= SDHCI_MISC_CTRL_ERASE_TIMEOUT_LIMIT;
> + else
> + val &= ~SDHCI_MISC_CTRL_ERASE_TIMEOUT_LIMIT;
> + sdhci_writel(host, val, SDHCI_TEGRA_VENDOR_MISC_CTRL);
> +
> + __sdhci_set_timeout(host, cmd);
kernel build on arm and arm64 architecture failed on stable-rc 4.19
(arm), 5.4 (arm64) and 5.5 (arm64)
drivers/mmc/host/sdhci-tegra.c: In function 'tegra_sdhci_set_timeout':
drivers/mmc/host/sdhci-tegra.c:1256:2: error: implicit declaration of
function '__sdhci_set_timeout'; did you mean
'tegra_sdhci_set_timeout'? [-Werror=implicit-function-declaration]
__sdhci_set_timeout(host, cmd);
^~~~~~~~~~~~~~~~~~~
tegra_sdhci_set_timeout
Full build log,
https://ci.linaro.org/view/lkft/job/openembedded-lkft-linux-stable-rc-5.5/D…https://ci.linaro.org/view/lkft/job/openembedded-lkft-linux-stable-rc-5.4/D…https://ci.linaro.org/view/lkft/job/openembedded-lkft-linux-stable-rc-4.19/…
- Naresh
Hi Linus,
Linux 5.7-rc1 reboot/powereoff hangs on AMD Ryzen 5 PRO 2400GE
system.
I isolated the commit to:
Revering the following commit fixes the problem.
commit 487eca11a321ef33bcf4ca5adb3c0c4954db1b58
Author: Prike Liang <Prike.Liang(a)amd.com>
Date: Tue Apr 7 20:21:26 2020 +0800
drm/amdgpu: fix gfx hang during suspend with video playback (v2)
The system will be hang up during S3 suspend because of SMU is
pending for GC not respose the register CP_HQD_ACTIVE access
request.This issue root cause of accessing the GC register under
enter GFX CGGPG and can be fixed by disable GFX CGPG before perform
suspend.
v2: Use disable the GFX CGPG instead of RLC safe mode guard.
Signed-off-by: Prike Liang <Prike.Liang(a)amd.com>
Tested-by: Mengbing Wang <Mengbing.Wang(a)amd.com>
Reviewed-by: Huang Rui <ray.huang(a)amd.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
Cc: stable(a)vger.kernel.org
I did the bisect on Linux 5.6.5-rc1
git bisect start
# good: [0a27a29496060843ae3a8fe78aaec0062cbd5dfa] Linux 5.6.4
git bisect good 0a27a29496060843ae3a8fe78aaec0062cbd5dfa
# bad: [576aa353744ce5f1279071363e4a55e97f486f39] Linux 5.6.5-rc1
git bisect bad 576aa353744ce5f1279071363e4a55e97f486f39
# good: [7509db5d111a5763a199902052eecc480e0ec724] x86/tsc_msr: Make MSR
derived TSC frequency more accurate
git bisect good 7509db5d111a5763a199902052eecc480e0ec724
# good: [15f1ead7d7966d087352ba2cf81a1759b25ad163] scsi: lpfc: Fix
broken Credit Recovery after driver load
git bisect good 15f1ead7d7966d087352ba2cf81a1759b25ad163
# good: [9e52b4ab5fadd803c8c2e617aa8c151720757fb1] s390/diag: fix
display of diagnose call statistics
git bisect good 9e52b4ab5fadd803c8c2e617aa8c151720757fb1
# good: [3e1e6903924fd6c95db7e46c5baa41dfa0f46fdb]
powerpc/hash64/devmap: Use H_PAGE_THP_HUGE when setting up huge devmap
PTE entries
git bisect good 3e1e6903924fd6c95db7e46c5baa41dfa0f46fdb
# good: [0344e0fee2f904ebce1d58bec8ace3ee9cf5f777] drm/dp_mst: Fix
clearing payload state on topology disable
git bisect good 0344e0fee2f904ebce1d58bec8ace3ee9cf5f777
# bad: [136881c0420bb52d5f21f75688f5aee1cf401737] perf/core: Unify
{pinned,flexible}_sched_in()
git bisect bad 136881c0420bb52d5f21f75688f5aee1cf401737
# bad: [0d928b424b99cfe0e7806c530f6039a053d5082d] drm/i915/ggtt: do not
set bits 1-11 in gen12 ptes
git bisect bad 0d928b424b99cfe0e7806c530f6039a053d5082d
# bad: [a655a99c9f6a1d819d695fca6d48b450449f45ee] drm/amdgpu: fix gfx
hang during suspend with video playback (v2)
git bisect bad a655a99c9f6a1d819d695fca6d48b450449f45ee
# first bad commit: [a655a99c9f6a1d819d695fca6d48b450449f45ee]
drm/amdgpu: fix gfx hang during suspend with video playback (v2)
thanks,
-- Shuah
From: Masahiro Yamada <masahiroy(a)kernel.org>
[ Upstream commit 63b903dfebdea92aa92ad337d8451a6fbfeabf9d ]
As far as I understood from the Kconfig help text, this build rule is
used to rebuild the driver firmware, which runs on an old m68k-based
chip. So, you need m68k tools for the firmware rebuild.
wanxl.c is a PCI driver, but CONFIG_M68K does not select CONFIG_HAVE_PCI.
So, you cannot enable CONFIG_WANXL_BUILD_FIRMWARE for ARCH=m68k. In other
words, ifeq ($(ARCH),m68k) is false here.
I am keeping the dead code for now, but rebuilding the firmware requires
'as68k' and 'ld68k', which I do not have in hand.
Instead, the kernel.org m68k GCC [1] successfully built it.
Allowing a user to pass in CROSS_COMPILE_M68K= is handier.
[1] https://mirrors.edge.kernel.org/pub/tools/crosstool/files/bin/x86_64/9.2.0/…
Suggested-by: Geert Uytterhoeven <geert(a)linux-m68k.org>
Signed-off-by: Masahiro Yamada <masahiroy(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/net/wan/Kconfig | 2 +-
drivers/net/wan/Makefile | 12 ++++++------
2 files changed, 7 insertions(+), 7 deletions(-)
diff --git a/drivers/net/wan/Kconfig b/drivers/net/wan/Kconfig
index dd1a147f29716..058d77d2e693d 100644
--- a/drivers/net/wan/Kconfig
+++ b/drivers/net/wan/Kconfig
@@ -200,7 +200,7 @@ config WANXL_BUILD_FIRMWARE
depends on WANXL && !PREVENT_FIRMWARE_BUILD
help
Allows you to rebuild firmware run by the QUICC processor.
- It requires as68k, ld68k and hexdump programs.
+ It requires m68k toolchains and hexdump programs.
You should never need this option, say N.
diff --git a/drivers/net/wan/Makefile b/drivers/net/wan/Makefile
index 701f5d2fe3b61..995277c657a1e 100644
--- a/drivers/net/wan/Makefile
+++ b/drivers/net/wan/Makefile
@@ -40,17 +40,17 @@ $(obj)/wanxl.o: $(obj)/wanxlfw.inc
ifeq ($(CONFIG_WANXL_BUILD_FIRMWARE),y)
ifeq ($(ARCH),m68k)
- AS68K = $(AS)
- LD68K = $(LD)
+ M68KAS = $(AS)
+ M68KLD = $(LD)
else
- AS68K = as68k
- LD68K = ld68k
+ M68KAS = $(CROSS_COMPILE_M68K)as
+ M68KLD = $(CROSS_COMPILE_M68K)ld
endif
quiet_cmd_build_wanxlfw = BLD FW $@
cmd_build_wanxlfw = \
- $(CPP) -D__ASSEMBLY__ -Wp,-MD,$(depfile) -I$(srctree)/include/uapi $< | $(AS68K) -m68360 -o $(obj)/wanxlfw.o; \
- $(LD68K) --oformat binary -Ttext 0x1000 $(obj)/wanxlfw.o -o $(obj)/wanxlfw.bin; \
+ $(CPP) -D__ASSEMBLY__ -Wp,-MD,$(depfile) -I$(srctree)/include/uapi $< | $(M68KAS) -m68360 -o $(obj)/wanxlfw.o; \
+ $(M68KLD) --oformat binary -Ttext 0x1000 $(obj)/wanxlfw.o -o $(obj)/wanxlfw.bin; \
hexdump -ve '"\n" 16/1 "0x%02X,"' $(obj)/wanxlfw.bin | sed 's/0x ,//g;1s/^/static const u8 firmware[]={/;$$s/,$$/\n};\n/' >$(obj)/wanxlfw.inc; \
rm -f $(obj)/wanxlfw.bin $(obj)/wanxlfw.o
--
2.20.1