Hi Paul,
On Wed, 2022-09-21 at 13:42 +0200, Paul Menzel wrote:
Dear Linux folks,
Moving from Linux 5.10.113 to 5.15.69, starting Mozilla Thunderbird or Mozilla Firefox with the home on NFS, both programs get killed, and Linux 5.15.69 logs:
[ 3827.604396] BUG: unable to handle page fault for address: 000000001d473c07 [ 3827.611297] #PF: supervisor read access in kernel mode [ 3827.616452] #PF: error_code(0x0000) - not-present page [ 3827.621604] PGD 0 P4D 0 [ 3827.624152] Oops: 0000 [#1] SMP PTI [ 3827.627657] CPU: 0 PID: 2378 Comm: firefox Not tainted 5.15.69.mx64.435 #1 [ 3827.634551] Hardware name: Dell Inc. Precision Tower 3620/0MWYPT, BIOS 2.20.0 12/09/2021 [ 3827.642659] RIP: 0010:nfs_scan_commit_list+0x1e/0x100 [nfs] [ 3827.648256] Code: 66 66 2e 0f 1f 84 00 00 00 00 00 90 0f 1f 44 00 00 41 57 41 56 41 55 41 54 55 53 48 83 ec 10 4c 8b 2f 48 89 3c 24 89 4c 24 0c <49> 8b 5d 00 4c 39 ef 0f 84 c3 00 00 00 48 89 f5 49 89 d6 4d 89 ef [ 3827.667057] RSP: 0018:ffffc90002097ce0 EFLAGS: 00010282 [ 3827.672294] RAX: 000000006329dcd6 RBX: ffffc90002097d60 RCX: 000000007fffffff [ 3827.679440] RDX: ffffc90002097d60 RSI: ffffc90002097d50 RDI: ffff8881d7618b38 [ 3827.686587] RBP: ffffc90002097d50 R08: 0000000000000001 R09: 0000000000000000 [ 3827.693734] R10: 0000000000000000 R11: 61c8864680b583eb R12: 0000000000000000 [ 3827.700880] R13: 000000001d473c07 R14: 0000000000000001 R15: 0000000000000000 [ 3827.708027] FS: 00007fa6141f2780(0000) GS:ffff88881dc00000(0000) knlGS:0000000000000000 [ 3827.716131] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 3827.721886] CR2: 000000001d473c07 CR3: 000000012dae0006 CR4: 00000000003706f0 [ 3827.729034] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 3827.736180] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 3827.743328] Call Trace: [ 3827.745779] <TASK> [ 3827.747883] nfs_scan_commit+0x76/0xb0 [nfs] [ 3827.752167] __nfs_commit_inode+0x108/0x180 [nfs] [ 3827.756886] nfs_wb_all+0x59/0x110 [nfs] [ 3827.760822] nfs4_inode_return_delegation+0x58/0x90 [nfsv4] [ 3827.766413] nfs4_proc_remove+0x101/0x110 [nfsv4] [ 3827.771130] nfs_unlink+0xf5/0x2d0 [nfs] [ 3827.775065] vfs_unlink+0x10b/0x280 [ 3827.778563] do_unlinkat+0x19e/0x2c0 [ 3827.782158] __x64_sys_unlink+0x3e/0x60 [ 3827.786002] ? __x64_sys_readlink+0x1b/0x30 [ 3827.790192] do_syscall_64+0x40/0x90 [ 3827.793779] entry_SYSCALL_64_after_hwframe+0x61/0xcb [ 3827.798847] RIP: 0033:0x7fa6142e2aa7 [ 3827.802435] Code: f0 ff ff 73 01 c3 48 8b 0d be 03 0d 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 b8 57 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 91 03 0d 00 f7 d8 64 89 01 48 [ 3827.821264] RSP: 002b:00007fff37879a08 EFLAGS: 00000202 ORIG_RAX: 0000000000000057 [ 3827.828848] RAX: ffffffffffffffda RBX: 0000000080004005 RCX: 00007fa6142e2aa7 [ 3827.835997] RDX: 0000000077120e8d RSI: 00007fa614383520 RDI: 00007fa605425b88 [ 3827.843145] RBP: 00007fa605425b88 R08: 00007fff37879add R09: 0000000000000000 [ 3827.850291] R10: 00007fa614362ae0 R11: 0000000000000202 R12: 0000000077120e8d [ 3827.857439] R13: 00007fff37879add R14: 00007fa6141f26c8 R15: 0000000000000065 [ 3827.864586] </TASK> [ 3827.866776] Modules linked in: rpcsec_gss_krb5 nfsv4 nfs 8021q garp stp mrp llc amdgpu snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio i915 iommu_v2 gpu_sched drm_ttm_helper iosf_mbi ttm drm_kms_helper x86_pkg_temp_thermal kvm_intel drm kvm snd_hda_codec_hdmi intel_gtt i2c_algo_bit fb_sys_fops syscopyarea sysfillrect snd_hda_intel input_leds led_class snd_intel_dspcfg sysimgblt e1000e snd_hda_codec hid_logitech_hidpp snd_hda_core hid_logitech_dj snd_usb_audio snd_usbmidi_lib snd_hwdep snd_rawmidi snd_pcm snd_timer uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_common snd wmi_bmof soundcore wmi iTCO_wdt video irqbypass crc32c_intel iTCO_vendor_support nfsd auth_rpcgss oid_registry nfs_acl lockd grace sunrpc ip_tables x_tables unix ipv6 autofs4 [ 3827.935422] CR2: 000000001d473c07 [ 3827.938745] ---[ end trace d7dc2bc122fe8836 ]---
Does cherry-picking commit 6e176d47160c ("NFSv4: Fixes for nfs4_inode_return_delegation()") into 5.15.69 from the upstream kernel tree fix the problem?
8<--------------------------------------------------- From 6e176d47160cec8bcaa28d9aa06926d72d54237c Mon Sep 17 00:00:00 2001 From: Trond Myklebust trond.myklebust@hammerspace.com Date: Sun, 10 Oct 2021 10:58:12 +0200 Subject: [PATCH] NFSv4: Fixes for nfs4_inode_return_delegation()
We mustn't call nfs_wb_all() on anything other than a regular file. Furthermore, we can exit early when we don't hold a delegation.
Reported-by: David Wysochanski dwysocha@redhat.com Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com --- fs/nfs/delegation.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/fs/nfs/delegation.c b/fs/nfs/delegation.c index 11118398f495..7c9eb679dbdb 100644 --- a/fs/nfs/delegation.c +++ b/fs/nfs/delegation.c @@ -755,11 +755,13 @@ int nfs4_inode_return_delegation(struct inode *inode) struct nfs_delegation *delegation;
delegation = nfs_start_delegation_return(nfsi); - /* Synchronous recall of any application leases */ - break_lease(inode, O_WRONLY | O_RDWR); - nfs_wb_all(inode); - if (delegation != NULL) + if (delegation != NULL) { + /* Synchronous recall of any application leases */ + break_lease(inode, O_WRONLY | O_RDWR); + if (S_ISREG(inode->i_mode)) + nfs_wb_all(inode); return nfs_end_delegation_return(inode, delegation, 1); + } return 0; }
Dear Trond,
Thank you for the quick reply.
Am 21.09.22 um 14:44 schrieb Trond Myklebust:
On Wed, 2022-09-21 at 13:42 +0200, Paul Menzel wrote:
Moving from Linux 5.10.113 to 5.15.69, starting Mozilla Thunderbird or Mozilla Firefox with the home on NFS, both programs get killed, and Linux 5.15.69 logs:
[ 3827.604396] BUG: unable to handle page fault for address: 000000001d473c07 [ 3827.611297] #PF: supervisor read access in kernel mode [ 3827.616452] #PF: error_code(0x0000) - not-present page [ 3827.621604] PGD 0 P4D 0 [ 3827.624152] Oops: 0000 [#1] SMP PTI [ 3827.627657] CPU: 0 PID: 2378 Comm: firefox Not tainted 5.15.69.mx64.435 #1 [ 3827.634551] Hardware name: Dell Inc. Precision Tower 3620/0MWYPT, BIOS 2.20.0 12/09/2021
[…]
[ 3827.743328] Call Trace: [ 3827.745779] <TASK> [ 3827.747883] nfs_scan_commit+0x76/0xb0 [nfs] [ 3827.752167] __nfs_commit_inode+0x108/0x180 [nfs] [ 3827.756886] nfs_wb_all+0x59/0x110 [nfs] [ 3827.760822] nfs4_inode_return_delegation+0x58/0x90 [nfsv4] [ 3827.766413] nfs4_proc_remove+0x101/0x110 [nfsv4] [ 3827.771130] nfs_unlink+0xf5/0x2d0 [nfs] [ 3827.775065] vfs_unlink+0x10b/0x280 [ 3827.778563] do_unlinkat+0x19e/0x2c0 [ 3827.782158] __x64_sys_unlink+0x3e/0x60 [ 3827.786002] ? __x64_sys_readlink+0x1b/0x30 [ 3827.790192] do_syscall_64+0x40/0x90 [ 3827.793779] entry_SYSCALL_64_after_hwframe+0x61/0xcb
[…]
Does cherry-picking commit 6e176d47160c ("NFSv4: Fixes for nfs4_inode_return_delegation()") into 5.15.69 from the upstream kernel tree fix the problem?
8<--------------------------------------------------- From 6e176d47160cec8bcaa28d9aa06926d72d54237c Mon Sep 17 00:00:00 2001 From: Trond Myklebust trond.myklebust@hammerspace.com Date: Sun, 10 Oct 2021 10:58:12 +0200 Subject: [PATCH] NFSv4: Fixes for nfs4_inode_return_delegation()
[…]
Indeed with that commit, present since v5.16-rc1, we are unable to reproduce the issue, so it seems to be the fix. It looks like there are not a lot of 5.15 NFS users out there. ;-)
Kind regards,
Paul
On Thu, 2022-09-22 at 13:42 +0200, Paul Menzel wrote:
Dear Trond,
Thank you for the quick reply.
Am 21.09.22 um 14:44 schrieb Trond Myklebust:
On Wed, 2022-09-21 at 13:42 +0200, Paul Menzel wrote:
Moving from Linux 5.10.113 to 5.15.69, starting Mozilla Thunderbird or Mozilla Firefox with the home on NFS, both programs get killed, and Linux 5.15.69 logs:
[ 3827.604396] BUG: unable to handle page fault for address: 000000001d473c07 [ 3827.611297] #PF: supervisor read access in kernel mode [ 3827.616452] #PF: error_code(0x0000) - not-present page [ 3827.621604] PGD 0 P4D 0 [ 3827.624152] Oops: 0000 [#1] SMP PTI [ 3827.627657] CPU: 0 PID: 2378 Comm: firefox Not tainted 5.15.69.mx64.435 #1 [ 3827.634551] Hardware name: Dell Inc. Precision Tower 3620/0MWYPT, BIOS 2.20.0 12/09/2021
[…]
[ 3827.743328] Call Trace: [ 3827.745779] <TASK> [ 3827.747883] nfs_scan_commit+0x76/0xb0 [nfs] [ 3827.752167] __nfs_commit_inode+0x108/0x180 [nfs] [ 3827.756886] nfs_wb_all+0x59/0x110 [nfs] [ 3827.760822] nfs4_inode_return_delegation+0x58/0x90 [nfsv4] [ 3827.766413] nfs4_proc_remove+0x101/0x110 [nfsv4] [ 3827.771130] nfs_unlink+0xf5/0x2d0 [nfs] [ 3827.775065] vfs_unlink+0x10b/0x280 [ 3827.778563] do_unlinkat+0x19e/0x2c0 [ 3827.782158] __x64_sys_unlink+0x3e/0x60 [ 3827.786002] ? __x64_sys_readlink+0x1b/0x30 [ 3827.790192] do_syscall_64+0x40/0x90 [ 3827.793779] entry_SYSCALL_64_after_hwframe+0x61/0xcb
[…]
Does cherry-picking commit 6e176d47160c ("NFSv4: Fixes for nfs4_inode_return_delegation()") into 5.15.69 from the upstream kernel tree fix the problem?
8<--------------------------------------------------- From 6e176d47160cec8bcaa28d9aa06926d72d54237c Mon Sep 17 00:00:00 2001 From: Trond Myklebust trond.myklebust@hammerspace.com Date: Sun, 10 Oct 2021 10:58:12 +0200 Subject: [PATCH] NFSv4: Fixes for nfs4_inode_return_delegation()
[…]
Indeed with that commit, present since v5.16-rc1, we are unable to reproduce the issue, so it seems to be the fix. It looks like there are not a lot of 5.15 NFS users out there. ;-)
I believe this is a dependency that was introduced by the back port of commit e591b298d7ec ("NFS: Save some space in the inode") into 5.15.68. So the reason it wasn't seen is because the change is very recent.
FYI Greg and Sasha: please also consider pulling 6e176d47160c ("NFSv4: Fixes for nfs4_inode_return_delegation()") into that stable series.
[adding Greg and Sasha to the recipients, to ensure they see this; CCing Kurt as well, to keep him in the loop]
On 22.09.22 15:44, Trond Myklebust wrote:
On Thu, 2022-09-22 at 13:42 +0200, Paul Menzel wrote:
Am 21.09.22 um 14:44 schrieb Trond Myklebust:
On Wed, 2022-09-21 at 13:42 +0200, Paul Menzel wrote:
Moving from Linux 5.10.113 to 5.15.69, starting Mozilla Thunderbird or Mozilla Firefox with the home on NFS, both programs get killed, and Linux 5.15.69 logs:
[ 3827.604396] BUG: unable to handle page fault for address: 000000001d473c07 [ 3827.611297] #PF: supervisor read access in kernel mode [ 3827.616452] #PF: error_code(0x0000) - not-present page [ 3827.621604] PGD 0 P4D 0 [ 3827.624152] Oops: 0000 [#1] SMP PTI [ 3827.627657] CPU: 0 PID: 2378 Comm: firefox Not tainted 5.15.69.mx64.435 #1 [ 3827.634551] Hardware name: Dell Inc. Precision Tower 3620/0MWYPT, BIOS 2.20.0 12/09/2021
[…]
[ 3827.743328] Call Trace: [ 3827.745779] <TASK> [ 3827.747883] nfs_scan_commit+0x76/0xb0 [nfs] [ 3827.752167] __nfs_commit_inode+0x108/0x180 [nfs] [ 3827.756886] nfs_wb_all+0x59/0x110 [nfs] [ 3827.760822] nfs4_inode_return_delegation+0x58/0x90 [nfsv4] [ 3827.766413] nfs4_proc_remove+0x101/0x110 [nfsv4] [ 3827.771130] nfs_unlink+0xf5/0x2d0 [nfs] [ 3827.775065] vfs_unlink+0x10b/0x280 [ 3827.778563] do_unlinkat+0x19e/0x2c0 [ 3827.782158] __x64_sys_unlink+0x3e/0x60 [ 3827.786002] ? __x64_sys_readlink+0x1b/0x30 [ 3827.790192] do_syscall_64+0x40/0x90 [ 3827.793779] entry_SYSCALL_64_after_hwframe+0x61/0xcb
[…]
Does cherry-picking commit 6e176d47160c ("NFSv4: Fixes for nfs4_inode_return_delegation()") into 5.15.69 from the upstream kernel tree fix the problem?
8<--------------------------------------------------- From 6e176d47160cec8bcaa28d9aa06926d72d54237c Mon Sep 17 00:00:00 2001 From: Trond Myklebust trond.myklebust@hammerspace.com Date: Sun, 10 Oct 2021 10:58:12 +0200 Subject: [PATCH] NFSv4: Fixes for nfs4_inode_return_delegation()
[…]
Indeed with that commit, present since v5.16-rc1, we are unable to reproduce the issue, so it seems to be the fix. It looks like there are not a lot of 5.15 NFS users out there. ;-)
I believe this is a dependency that was introduced by the back port of commit e591b298d7ec ("NFS: Save some space in the inode") into 5.15.68. So the reason it wasn't seen is because the change is very recent.
Side note: I wonder if that is causing this problem from Kurt as well: https://lore.kernel.org/all/f6755107-b62c-a388-0ab5-0a6633bf9082@garloff.de/
FYI Greg and Sasha: please also consider pulling 6e176d47160c ("NFSv4: Fixes for nfs4_inode_return_delegation()") into that stable series.
Greg, I noticed you in the past few days added quite a few patches into the queue for the next 5.15.y release, but this one was not among them afaics. So just to be sure: is that still on your todo list or is more needed to get 6e176d47160c added in time for the next stable -rc?
Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
P.S.: As the Linux kernel's regression tracker I deal with a lot of reports and sometimes miss something important when writing mails like this. If that's the case here, don't hesitate to tell me in a public reply, it's in everyone's interest to set the public record straight.
On Mon, Sep 26, 2022 at 08:00:46AM +0200, Thorsten Leemhuis wrote:
[adding Greg and Sasha to the recipients, to ensure they see this; CCing Kurt as well, to keep him in the loop]
On 22.09.22 15:44, Trond Myklebust wrote:
On Thu, 2022-09-22 at 13:42 +0200, Paul Menzel wrote:
Am 21.09.22 um 14:44 schrieb Trond Myklebust:
On Wed, 2022-09-21 at 13:42 +0200, Paul Menzel wrote:
Moving from Linux 5.10.113 to 5.15.69, starting Mozilla Thunderbird or Mozilla Firefox with the home on NFS, both programs get killed, and Linux 5.15.69 logs:
[ 3827.604396] BUG: unable to handle page fault for address: 000000001d473c07 [ 3827.611297] #PF: supervisor read access in kernel mode [ 3827.616452] #PF: error_code(0x0000) - not-present page [ 3827.621604] PGD 0 P4D 0 [ 3827.624152] Oops: 0000 [#1] SMP PTI [ 3827.627657] CPU: 0 PID: 2378 Comm: firefox Not tainted 5.15.69.mx64.435 #1 [ 3827.634551] Hardware name: Dell Inc. Precision Tower 3620/0MWYPT, BIOS 2.20.0 12/09/2021
[…]
[ 3827.743328] Call Trace: [ 3827.745779] <TASK> [ 3827.747883] nfs_scan_commit+0x76/0xb0 [nfs] [ 3827.752167] __nfs_commit_inode+0x108/0x180 [nfs] [ 3827.756886] nfs_wb_all+0x59/0x110 [nfs] [ 3827.760822] nfs4_inode_return_delegation+0x58/0x90 [nfsv4] [ 3827.766413] nfs4_proc_remove+0x101/0x110 [nfsv4] [ 3827.771130] nfs_unlink+0xf5/0x2d0 [nfs] [ 3827.775065] vfs_unlink+0x10b/0x280 [ 3827.778563] do_unlinkat+0x19e/0x2c0 [ 3827.782158] __x64_sys_unlink+0x3e/0x60 [ 3827.786002] ? __x64_sys_readlink+0x1b/0x30 [ 3827.790192] do_syscall_64+0x40/0x90 [ 3827.793779] entry_SYSCALL_64_after_hwframe+0x61/0xcb
[…]
Does cherry-picking commit 6e176d47160c ("NFSv4: Fixes for nfs4_inode_return_delegation()") into 5.15.69 from the upstream kernel tree fix the problem?
8<--------------------------------------------------- From 6e176d47160cec8bcaa28d9aa06926d72d54237c Mon Sep 17 00:00:00 2001 From: Trond Myklebust trond.myklebust@hammerspace.com Date: Sun, 10 Oct 2021 10:58:12 +0200 Subject: [PATCH] NFSv4: Fixes for nfs4_inode_return_delegation()
[…]
Indeed with that commit, present since v5.16-rc1, we are unable to reproduce the issue, so it seems to be the fix. It looks like there are not a lot of 5.15 NFS users out there. ;-)
I believe this is a dependency that was introduced by the back port of commit e591b298d7ec ("NFS: Save some space in the inode") into 5.15.68. So the reason it wasn't seen is because the change is very recent.
Side note: I wonder if that is causing this problem from Kurt as well: https://lore.kernel.org/all/f6755107-b62c-a388-0ab5-0a6633bf9082@garloff.de/
FYI Greg and Sasha: please also consider pulling 6e176d47160c ("NFSv4: Fixes for nfs4_inode_return_delegation()") into that stable series.
Greg, I noticed you in the past few days added quite a few patches into the queue for the next 5.15.y release, but this one was not among them afaics. So just to be sure: is that still on your todo list or is more needed to get 6e176d47160c added in time for the next stable -rc?
I don't see any request by anyone in the stable@vger.kernel.org history asking for that commit to be added, so no, it was not in my queue.
I'll go add it now, thanks.
greg k-h
Hi Thorsten,
thanks for collecting this issue and providing relevant context!
On 26/09/2022 08:00, Thorsten Leemhuis wrote:
[adding Greg and Sasha to the recipients, to ensure they see this; CCing Kurt as well, to keep him in the loop]
On 22.09.22 15:44, Trond Myklebust wrote:
On Thu, 2022-09-22 at 13:42 +0200, Paul Menzel wrote:
Am 21.09.22 um 14:44 schrieb Trond Myklebust:
On Wed, 2022-09-21 at 13:42 +0200, Paul Menzel wrote:
Moving from Linux 5.10.113 to 5.15.69, starting Mozilla Thunderbird or Mozilla Firefox with the home on NFS, both programs get killed, and Linux 5.15.69 logs:
[ 3827.604396] BUG: unable to handle page fault for address: 000000001d473c07 [ 3827.611297] #PF: supervisor read access in kernel mode [ 3827.616452] #PF: error_code(0x0000) - not-present page [ 3827.621604] PGD 0 P4D 0 [ 3827.624152] Oops: 0000 [#1] SMP PTI [ 3827.627657] CPU: 0 PID: 2378 Comm: firefox Not tainted 5.15.69.mx64.435 #1 [ 3827.634551] Hardware name: Dell Inc. Precision Tower 3620/0MWYPT, BIOS 2.20.0 12/09/2021
[…]
[ 3827.743328] Call Trace: [ 3827.745779] <TASK> [ 3827.747883] nfs_scan_commit+0x76/0xb0 [nfs] [ 3827.752167] __nfs_commit_inode+0x108/0x180 [nfs] [ 3827.756886] nfs_wb_all+0x59/0x110 [nfs] [ 3827.760822] nfs4_inode_return_delegation+0x58/0x90 [nfsv4] [ 3827.766413] nfs4_proc_remove+0x101/0x110 [nfsv4] [ 3827.771130] nfs_unlink+0xf5/0x2d0 [nfs] [ 3827.775065] vfs_unlink+0x10b/0x280 [ 3827.778563] do_unlinkat+0x19e/0x2c0 [ 3827.782158] __x64_sys_unlink+0x3e/0x60 [ 3827.786002] ? __x64_sys_readlink+0x1b/0x30 [ 3827.790192] do_syscall_64+0x40/0x90 [ 3827.793779] entry_SYSCALL_64_after_hwframe+0x61/0xcb
[…]
Does cherry-picking commit 6e176d47160c ("NFSv4: Fixes for nfs4_inode_return_delegation()") into 5.15.69 from the upstream kernel tree fix the problem?
8<--------------------------------------------------- From 6e176d47160cec8bcaa28d9aa06926d72d54237c Mon Sep 17 00:00:00 2001 From: Trond Myklebust trond.myklebust@hammerspace.com Date: Sun, 10 Oct 2021 10:58:12 +0200 Subject: [PATCH] NFSv4: Fixes for nfs4_inode_return_delegation()
[…]
Indeed with that commit, present since v5.16-rc1, we are unable to reproduce the issue, so it seems to be the fix. It looks like there are not a lot of 5.15 NFS users out there. ;-)
I believe this is a dependency that was introduced by the back port of commit e591b298d7ec ("NFS: Save some space in the inode") into 5.15.68. So the reason it wasn't seen is because the change is very recent.
Side note: I wonder if that is causing this problem from Kurt as well: https://lore.kernel.org/all/f6755107-b62c-a388-0ab5-0a6633bf9082@garloff.de/
Looks like it: After confirming that the 5.15.69 kernel worked again fine backing out those last three NFS commits, I reapplied them and cherry-picked commit 6e176d47160c as suggested. The kernel worked flawlessly thus far, so this seems to indeed be a requirement for e591b298d7ec not to cause harm.
FYI Greg and Sasha: please also consider pulling 6e176d47160c ("NFSv4: Fixes for nfs4_inode_return_delegation()") into that stable series.
Greg, I noticed you in the past few days added quite a few patches into the queue for the next 5.15.y release, but this one was not among them afaics. So just to be sure: is that still on your todo list or is more needed to get 6e176d47160c added in time for the next stable -rc?
So by all means, Greg, please put this in the stable queue unless the NFS wizards out there consider it safer to revert e591b298d7ec instead.
Thanks,
On Tue, Sep 27, 2022 at 08:59:31PM +0200, Kurt Garloff wrote:
Hi Thorsten,
thanks for collecting this issue and providing relevant context!
On 26/09/2022 08:00, Thorsten Leemhuis wrote:
[adding Greg and Sasha to the recipients, to ensure they see this; CCing Kurt as well, to keep him in the loop]
On 22.09.22 15:44, Trond Myklebust wrote:
On Thu, 2022-09-22 at 13:42 +0200, Paul Menzel wrote:
Am 21.09.22 um 14:44 schrieb Trond Myklebust:
On Wed, 2022-09-21 at 13:42 +0200, Paul Menzel wrote:
Moving from Linux 5.10.113 to 5.15.69, starting Mozilla Thunderbird or Mozilla Firefox with the home on NFS, both programs get killed, and Linux 5.15.69 logs:
[ 3827.604396] BUG: unable to handle page fault for address: 000000001d473c07 [ 3827.611297] #PF: supervisor read access in kernel mode [ 3827.616452] #PF: error_code(0x0000) - not-present page [ 3827.621604] PGD 0 P4D 0 [ 3827.624152] Oops: 0000 [#1] SMP PTI [ 3827.627657] CPU: 0 PID: 2378 Comm: firefox Not tainted 5.15.69.mx64.435 #1 [ 3827.634551] Hardware name: Dell Inc. Precision Tower 3620/0MWYPT, BIOS 2.20.0 12/09/2021
[…]
[ 3827.743328] Call Trace: [ 3827.745779] <TASK> [ 3827.747883] nfs_scan_commit+0x76/0xb0 [nfs] [ 3827.752167] __nfs_commit_inode+0x108/0x180 [nfs] [ 3827.756886] nfs_wb_all+0x59/0x110 [nfs] [ 3827.760822] nfs4_inode_return_delegation+0x58/0x90 [nfsv4] [ 3827.766413] nfs4_proc_remove+0x101/0x110 [nfsv4] [ 3827.771130] nfs_unlink+0xf5/0x2d0 [nfs] [ 3827.775065] vfs_unlink+0x10b/0x280 [ 3827.778563] do_unlinkat+0x19e/0x2c0 [ 3827.782158] __x64_sys_unlink+0x3e/0x60 [ 3827.786002] ? __x64_sys_readlink+0x1b/0x30 [ 3827.790192] do_syscall_64+0x40/0x90 [ 3827.793779] entry_SYSCALL_64_after_hwframe+0x61/0xcb
[…]
Does cherry-picking commit 6e176d47160c ("NFSv4: Fixes for nfs4_inode_return_delegation()") into 5.15.69 from the upstream kernel tree fix the problem?
8<--------------------------------------------------- From 6e176d47160cec8bcaa28d9aa06926d72d54237c Mon Sep 17 00:00:00 2001 From: Trond Myklebust trond.myklebust@hammerspace.com Date: Sun, 10 Oct 2021 10:58:12 +0200 Subject: [PATCH] NFSv4: Fixes for nfs4_inode_return_delegation()
[…]
Indeed with that commit, present since v5.16-rc1, we are unable to reproduce the issue, so it seems to be the fix. It looks like there are not a lot of 5.15 NFS users out there. ;-)
I believe this is a dependency that was introduced by the back port of commit e591b298d7ec ("NFS: Save some space in the inode") into 5.15.68. So the reason it wasn't seen is because the change is very recent.
Side note: I wonder if that is causing this problem from Kurt as well: https://lore.kernel.org/all/f6755107-b62c-a388-0ab5-0a6633bf9082@garloff.de/
Looks like it: After confirming that the 5.15.69 kernel worked again fine backing out those last three NFS commits, I reapplied them and cherry-picked commit 6e176d47160c as suggested. The kernel worked flawlessly thus far, so this seems to indeed be a requirement for e591b298d7ec not to cause harm.
FYI Greg and Sasha: please also consider pulling 6e176d47160c ("NFSv4: Fixes for nfs4_inode_return_delegation()") into that stable series.
Greg, I noticed you in the past few days added quite a few patches into the queue for the next 5.15.y release, but this one was not among them afaics. So just to be sure: is that still on your todo list or is more needed to get 6e176d47160c added in time for the next stable -rc?
So by all means, Greg, please put this in the stable queue unless the NFS wizards out there consider it safer to revert e591b298d7ec instead.
Already queued up for the next 5.15.y release that will happen in a few hours, thanks for testing.
greg k-h
Hi Greg,
On 28/09/2022 08:51, Greg KH wrote:
On Tue, Sep 27, 2022 at 08:59:31PM +0200, Kurt Garloff wrote:
On 26/09/2022 08:00, Thorsten Leemhuis wrote:
Does cherry-picking commit 6e176d47160c ("NFSv4: Fixes for
nfs4_inode_return_delegation()") into 5.15.69 from the upstream kernel tree fix the problem?
8<--------------------------------------------------- From 6e176d47160cec8bcaa28d9aa06926d72d54237c Mon Sep 17 00:00:00 2001 From: Trond Myklebust trond.myklebust@hammerspace.com Date: Sun, 10 Oct 2021 10:58:12 +0200 Subject: [PATCH] NFSv4: Fixes for nfs4_inode_return_delegation()
[…]
Indeed with that commit, present since v5.16-rc1, we are unable to reproduce the issue, so it seems to be the fix. It looks like there are not a lot of 5.15 NFS users out there. ;-)
I believe this is a dependency that was introduced by the back port of commit e591b298d7ec ("NFS: Save some space in the inode") into 5.15.68. So the reason it wasn't seen is because the change is very recent.
Side note: I wonder if that is causing this problem from Kurt as well: https://lore.kernel.org/all/f6755107-b62c-a388-0ab5-0a6633bf9082@garloff.de/
Looks like it: After confirming that the 5.15.69 kernel worked again fine backing out those last three NFS commits, I reapplied them and cherry-picked commit 6e176d47160c as suggested. The kernel worked flawlessly thus far, so this seems to indeed be a requirement for e591b298d7ec not to cause harm. [...] So by all means, Greg, please put this in the stable queue unless the NFS wizards out there consider it safer to revert e591b298d7ec instead.
Already queued up for the next 5.15.y release that will happen in a few hours, thanks for testing.
And -- unsurprisingly -- I can confirm that NFS in 5.15.71 does work again, indeed.
Thanks!
linux-stable-mirror@lists.linaro.org