Hi, this is your Linux kernel regression tracker speaking. Top-posting
for once, to make this easy accessible to everyone.
Below issue that started to happen between v5.10.80..v5.10.90 was
recently reported to bugzilla, but the reporter didn't even get a single
reply afaics. Could somebody maybe take a look? Bisection is likely no
easy in this case, so a few tips to narrow down the area to search might
help a lot here.
https://bugzilla.kernel.org/show_bug.cgi?id=215562
Ciao, Thorsten
On 03.02.22 16:03, Thorsten Leemhuis wrote:
> Hi, this is your Linux kernel regression tracker speaking.
>
> There is a regression in bugzilla.kernel.org I'd like to add to the
> tracking:
>
> #regzbot introduced: v5.10.80..v5.10.90
> #regzbot from: Patrick Schaaf <kernelorg(a)bof.de>
> #regzbot title: mm: unable to handle page fault in cache_reap
> #regzbot link: https://bugzilla.kernel.org/show_bug.cgi?id=215562
>
> Quote:
>
>> We've been running self-built 5.10.x kernels on DL380 hosts for quite a while, also inside the VMs there.
>>
>> With I think 5.10.90 three weeks or so back, we experienced a lockup upon umounting a larger, dirty filesystem on the host side, unfortunately without capturing a backtrace back then.
>>
>> Today something feeling similar, happened again, on a machine running 5.10.93 both on the host and inside its 10 various VMs.
>>
>> Problem showed shortly (minutes) after shutting down one of the VMs (few hundred GB memory / dataset, VM shutdown was complete already; direct I/O), and then some LVM volume renames, a quick short outside ext4 mount followed by an umount (8 GB volume, probably a few hundred megabyte only to write). Actually monitoring suggests that disk writes were already done about a minute before the onset.
>>
>> What we then experienced, was the following BUG:, followed by one after the other CPU saying goodbye with soft lockup messages over the course of a few minutes; meanwhile there was no more pinging the box, logging in on console, etc. We hard powercycled and it recovered fully.
>>
>> here's the BUG that was logged; if it is useful for someone to see the followup soft lockup messages, tell me + I'll add them.
>>
>> Feb 02 15:22:27 kvm3j kernel: BUG: unable to handle page fault for address: ffffebde00000008
>> Feb 02 15:22:27 kvm3j kernel: #PF: supervisor read access in kernel mode
>> Feb 02 15:22:27 kvm3j kernel: #PF: error_code(0x0000) - not-present page
>> Feb 02 15:22:27 kvm3j kernel: Oops: 0000 [#1] SMP PTI
>> Feb 02 15:22:27 kvm3j kernel: CPU: 7 PID: 39833 Comm: kworker/7:0 Tainted: G I 5.10.93-kvm #1
>> Feb 02 15:22:27 kvm3j kernel: Hardware name: HP ProLiant DL380p Gen8, BIOS P70 12/20/2013
>> Feb 02 15:22:27 kvm3j kernel: Workqueue: events cache_reap
>> Feb 02 15:22:27 kvm3j kernel: RIP: 0010:free_block.constprop.0+0xc0/0x1f0
>> Feb 02 15:22:27 kvm3j kernel: Code: 4c 8b 16 4c 89 d0 48 01 e8 0f 82 32 01 00 00 4c 89 f2 48 bb 00 00 00 00 00 ea ff ff 48 01 d0 48 c1 e8 0c 48 c1 e0 06 48 01 d8 <48> 8b 50 08 48 8d 4a ff 83 e2 01 48 >
>> Feb 02 15:22:27 kvm3j kernel: RSP: 0018:ffffc9000252bdc8 EFLAGS: 00010086
>> Feb 02 15:22:27 kvm3j kernel: RAX: ffffebde00000000 RBX: ffffea0000000000 RCX: ffff888889141b00
>> Feb 02 15:22:27 kvm3j kernel: RDX: 0000777f80000000 RSI: ffff893d3edf3400 RDI: ffff8881000403c0
>> Feb 02 15:22:27 kvm3j kernel: RBP: 0000000080000000 R08: ffff888100041300 R09: 0000000000000003
>> Feb 02 15:22:27 kvm3j kernel: R10: 0000000000000000 R11: ffff888100041308 R12: dead000000000122
>> Feb 02 15:22:27 kvm3j kernel: R13: dead000000000100 R14: 0000777f80000000 R15: ffff893ed8780d60
>> Feb 02 15:22:27 kvm3j kernel: FS: 0000000000000000(0000) GS:ffff893d3edc0000(0000) knlGS:0000000000000000
>> Feb 02 15:22:27 kvm3j kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> Feb 02 15:22:27 kvm3j kernel: CR2: ffffebde00000008 CR3: 000000048c4aa002 CR4: 00000000001726e0
>> Feb 02 15:22:27 kvm3j kernel: Call Trace:
>> Feb 02 15:22:27 kvm3j kernel: drain_array_locked.constprop.0+0x2e/0x80
>> Feb 02 15:22:27 kvm3j kernel: drain_array.constprop.0+0x54/0x70
>> Feb 02 15:22:27 kvm3j kernel: cache_reap+0x6c/0x100
>> Feb 02 15:22:27 kvm3j kernel: process_one_work+0x1cf/0x360
>> Feb 02 15:22:27 kvm3j kernel: worker_thread+0x45/0x3a0
>> Feb 02 15:22:27 kvm3j kernel: ? process_one_work+0x360/0x360
>> Feb 02 15:22:27 kvm3j kernel: kthread+0x116/0x130
>> Feb 02 15:22:27 kvm3j kernel: ? kthread_create_worker_on_cpu+0x40/0x40
>> Feb 02 15:22:27 kvm3j kernel: ret_from_fork+0x22/0x30
>> Feb 02 15:22:27 kvm3j kernel: Modules linked in: hpilo
>> Feb 02 15:22:27 kvm3j kernel: CR2: ffffebde00000008
>> Feb 02 15:22:27 kvm3j kernel: ---[ end trace ded3153d86a92898 ]---
>> Feb 02 15:22:27 kvm3j kernel: RIP: 0010:free_block.constprop.0+0xc0/0x1f0
>> Feb 02 15:22:27 kvm3j kernel: Code: 4c 8b 16 4c 89 d0 48 01 e8 0f 82 32 01 00 00 4c 89 f2 48 bb 00 00 00 00 00 ea ff ff 48 01 d0 48 c1 e8 0c 48 c1 e0 06 48 01 d8 <48> 8b 50 08 48 8d 4a ff 83 e2 01 48 >
>> Feb 02 15:22:27 kvm3j kernel: RSP: 0018:ffffc9000252bdc8 EFLAGS: 00010086
>> Feb 02 15:22:27 kvm3j kernel: RAX: ffffebde00000000 RBX: ffffea0000000000 RCX: ffff888889141b00
>> Feb 02 15:22:27 kvm3j kernel: RDX: 0000777f80000000 RSI: ffff893d3edf3400 RDI: ffff8881000403c0
>> Feb 02 15:22:27 kvm3j kernel: RBP: 0000000080000000 R08: ffff888100041300 R09: 0000000000000003
>> Feb 02 15:22:27 kvm3j kernel: R10: 0000000000000000 R11: ffff888100041308 R12: dead000000000122
>> Feb 02 15:22:27 kvm3j kernel: R13: dead000000000100 R14: 0000777f80000000 R15: ffff893ed8780d60
>> Feb 02 15:22:27 kvm3j kernel: FS: 0000000000000000(0000) GS:ffff893d3edc0000(0000) knlGS:0000000000000000
>> Feb 02 15:22:27 kvm3j kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> Feb 02 15:22:27 kvm3j kernel: CR2: ffffebde00000008 CR3: 000000048c4aa002 CR4: 00000000001726e0
>
> Ciao, Thorsten (wearing his 'Linux kernel regression tracker' hat)
>
> P.S.: As a Linux kernel regression tracker I'm getting a lot of reports
> on my table. I can only look briefly into most of them. Unfortunately
> therefore I sometimes will get things wrong or miss something important.
> I hope that's not the case here; if you think it is, don't hesitate to
> tell me about it in a public reply, that's in everyone's interest.
>
> BTW, I have no personal interest in this issue, which is tracked using
> regzbot, my Linux kernel regression tracking bot
> (https://linux-regtracking.leemhuis.info/regzbot/). I'm only posting
> this mail to get things rolling again and hence don't need to be CC on
> all further activities wrt to this regression.
>
> ---
> Additional information about regzbot:
>
> If you want to know more about regzbot, check out its web-interface, the
> getting start guide, and/or the references documentation:
>
> https://linux-regtracking.leemhuis.info/regzbot/
> https://gitlab.com/knurd42/regzbot/-/blob/main/docs/getting_started.md
> https://gitlab.com/knurd42/regzbot/-/blob/main/docs/reference.md
>
> The last two documents will explain how you can interact with regzbot
> yourself if your want to.
>
> Hint for reporters: when reporting a regression it's in your interest to
> tell #regzbot about it in the report, as that will ensure the regression
> gets on the radar of regzbot and the regression tracker. That's in your
> interest, as they will make sure the report won't fall through the
> cracks unnoticed.
>
> Hint for developers: you normally don't need to care about regzbot once
> it's involved. Fix the issue as you normally would, just remember to
> include a 'Link:' tag to the report in the commit message, as explained
> in Documentation/process/submitting-patches.rst
> That aspect was recently was made more explicit in commit 1f57bd42b77c:
> https://git.kernel.org/linus/1f57bd42b77c
This is the start of the stable review cycle for the 4.19.230 release.
There are 49 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Wed, 16 Feb 2022 09:24:36 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.19.230-r…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.19.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.19.230-rc1
Song Liu <song(a)kernel.org>
perf: Fix list corruption in perf_cgroup_switch()
Armin Wolf <W_Armin(a)gmx.de>
hwmon: (dell-smm) Speed up setting of fan speed
Kees Cook <keescook(a)chromium.org>
seccomp: Invalidate seccomp mode to catch death failures
Johan Hovold <johan(a)kernel.org>
USB: serial: cp210x: add CPI Bulk Coin Recycler id
Johan Hovold <johan(a)kernel.org>
USB: serial: cp210x: add NCR Retail IO box id
Stephan Brunner <s.brunner(a)stephan-brunner.net>
USB: serial: ch341: add support for GW Instek USB2.0-Serial devices
Pawel Dembicki <paweldembicki(a)gmail.com>
USB: serial: option: add ZTE MF286D modem
Cameron Williams <cang1(a)live.co.uk>
USB: serial: ftdi_sio: add support for Brainboxes US-159/235/320
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
usb: gadget: rndis: check size of RNDIS_MSG_SET command
Szymon Heidrich <szymon.heidrich(a)gmail.com>
USB: gadget: validate interface OS descriptor requests
Udipto Goswami <quic_ugoswami(a)quicinc.com>
usb: dwc3: gadget: Prevent core from processing stale TRBs
Sean Anderson <sean.anderson(a)seco.com>
usb: ulpi: Call of_node_put correctly
Sean Anderson <sean.anderson(a)seco.com>
usb: ulpi: Move of_node_put to ulpi_dev_release
TATSUKAWA KOSUKE (立川 江介) <tatsu-ab1(a)nec.com>
n_tty: wake up poll(POLLRDNORM) on receiving data
Jakob Koschel <jakobkoschel(a)gmail.com>
vt_ioctl: add array_index_nospec to VT_ACTIVATE
Jakob Koschel <jakobkoschel(a)gmail.com>
vt_ioctl: fix array_index_nospec in vt_setactivate
Raju Rangoju <Raju.Rangoju(a)amd.com>
net: amd-xgbe: disable interrupts during pci removal
Jon Maloy <jmaloy(a)redhat.com>
tipc: rate limit warning for received illegal binding update
Eric Dumazet <edumazet(a)google.com>
veth: fix races around rq->rx_notify_masked
Antoine Tenart <atenart(a)kernel.org>
net: fix a memleak when uncloning an skb dst and its metadata
Antoine Tenart <atenart(a)kernel.org>
net: do not keep the dst cache when uncloning an skb dst and its metadata
Eric Dumazet <edumazet(a)google.com>
ipmr,ip6mr: acquire RTNL before calling ip[6]mr_free_table() on failure path
Mahesh Bandewar <maheshb(a)google.com>
bonding: pair enable_port with slave_arr_updates
Samuel Mendoza-Jonas <samjonas(a)amazon.com>
ixgbevf: Require large buffers for build_skb on 82599VF
Udipto Goswami <quic_ugoswami(a)quicinc.com>
usb: f_fs: Fix use-after-free for epfile
Fabio Estevam <festevam(a)gmail.com>
ARM: dts: imx6qdl-udoo: Properly describe the SD card detect
Uwe Kleine-König <u.kleine-koenig(a)pengutronix.de>
staging: fbtft: Fix error path in fbtft_driver_module_init()
Martin Blumenstingl <martin.blumenstingl(a)googlemail.com>
ARM: dts: meson: Fix the UART compatible strings
Zechuan Chen <chenzechuan1(a)huawei.com>
perf probe: Fix ppc64 'perf probe add events failed' case
Nikolay Aleksandrov <nikolay(a)cumulusnetworks.com>
net: bridge: fix stale eth hdr pointer in br_dev_xmit
Fabio Estevam <festevam(a)gmail.com>
ARM: dts: imx23-evk: Remove MX23_PAD_SSP1_DETECT from hog group
Daniel Borkmann <daniel(a)iogearbox.net>
bpf: Add kconfig knob for disabling unpriv bpf by default
Jisheng Zhang <jszhang(a)kernel.org>
net: stmmac: dwmac-sun8i: use return val of readl_poll_timeout()
Amelie Delaunay <amelie.delaunay(a)foss.st.com>
usb: dwc2: gadget: don't try to disable ep0 in dwc2_hsotg_suspend
ZouMingzhe <mingzhe.zou(a)easystack.cn>
scsi: target: iscsi: Make sure the np under each tpg is unique
Victor Nogueira <victor(a)mojatatu.com>
net: sched: Clarify error message when qdisc kind is unknown
Olga Kornievskaia <kolga(a)netapp.com>
NFSv4 expose nfs_parse_server_name function
Olga Kornievskaia <kolga(a)netapp.com>
NFSv4 remove zero number of fs_locations entries error check
Trond Myklebust <trond.myklebust(a)hammerspace.com>
NFSv4.1: Fix uninitialised variable in devicenotify
Xiaoke Wang <xkernel.wang(a)foxmail.com>
nfs: nfs4clinet: check the return value of kstrdup()
Olga Kornievskaia <kolga(a)netapp.com>
NFSv4 only print the label when its queried
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Fix offset type in I/O trace points
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Clamp WRITE offsets
Trond Myklebust <trond.myklebust(a)hammerspace.com>
NFS: Fix initialisation of nfs_client cl_flags field
Pavel Parkhomenko <Pavel.Parkhomenko(a)baikalelectronics.ru>
net: phy: marvell: Fix MDI-x polarity setting in 88e1118-compatible PHYs
Jiasheng Jiang <jiasheng(a)iscas.ac.cn>
mmc: sdhci-of-esdhc: Check for error num after setting mask
Roberto Sassu <roberto.sassu(a)huawei.com>
ima: Allow template selection with ima_template[_fmt]= after ima_hash=
Stefan Berger <stefanb(a)linux.ibm.com>
ima: Remove ima_policy file before directory
Xiaoke Wang <xkernel.wang(a)foxmail.com>
integrity: check the return value of audit_log_start()
-------------
Diffstat:
Documentation/sysctl/kernel.txt | 21 +++++++++
Makefile | 4 +-
arch/arm/boot/dts/imx23-evk.dts | 1 -
arch/arm/boot/dts/imx6qdl-udoo.dtsi | 5 +-
arch/arm/boot/dts/meson.dtsi | 8 ++--
drivers/hwmon/dell-smm-hwmon.c | 12 +++--
drivers/mmc/host/sdhci-of-esdhc.c | 8 +++-
drivers/net/bonding/bond_3ad.c | 3 +-
drivers/net/ethernet/amd/xgbe/xgbe-pci.c | 3 ++
drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 13 +++---
drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c | 2 +-
drivers/net/phy/marvell.c | 7 ++-
drivers/net/veth.c | 13 ++++--
drivers/staging/fbtft/fbtft.h | 5 +-
drivers/target/iscsi/iscsi_target_tpg.c | 3 ++
drivers/tty/n_tty.c | 4 +-
drivers/tty/vt/vt_ioctl.c | 5 +-
drivers/usb/common/ulpi.c | 10 ++--
drivers/usb/dwc2/gadget.c | 2 +-
drivers/usb/dwc3/gadget.c | 13 ++++++
drivers/usb/gadget/composite.c | 3 ++
drivers/usb/gadget/function/f_fs.c | 56 +++++++++++++++++------
drivers/usb/gadget/function/rndis.c | 9 ++--
drivers/usb/serial/ch341.c | 1 +
drivers/usb/serial/cp210x.c | 2 +
drivers/usb/serial/ftdi_sio.c | 3 ++
drivers/usb/serial/ftdi_sio_ids.h | 3 ++
drivers/usb/serial/option.c | 2 +
fs/nfs/callback.h | 2 +-
fs/nfs/callback_proc.c | 2 +-
fs/nfs/callback_xdr.c | 18 ++++----
fs/nfs/client.c | 2 +-
fs/nfs/nfs4_fs.h | 3 +-
fs/nfs/nfs4client.c | 5 +-
fs/nfs/nfs4namespace.c | 4 +-
fs/nfs/nfs4state.c | 3 ++
fs/nfs/nfs4xdr.c | 9 ++--
fs/nfsd/nfs3proc.c | 5 ++
fs/nfsd/nfs4proc.c | 5 +-
fs/nfsd/trace.h | 14 +++---
include/net/dst_metadata.h | 14 +++++-
init/Kconfig | 10 ++++
kernel/bpf/syscall.c | 3 +-
kernel/events/core.c | 4 +-
kernel/seccomp.c | 10 ++++
kernel/sysctl.c | 29 ++++++++++--
net/bridge/br_device.c | 6 +--
net/ipv4/ipmr.c | 2 +
net/ipv6/ip6mr.c | 2 +
net/sched/sch_api.c | 2 +-
net/tipc/name_distr.c | 2 +-
security/integrity/ima/ima_fs.c | 2 +-
security/integrity/ima/ima_template.c | 10 ++--
security/integrity/integrity_audit.c | 2 +
tools/perf/util/probe-event.c | 3 ++
55 files changed, 289 insertions(+), 105 deletions(-)