On Thu, 2019-01-24 at 19:58 +0000, Trond Myklebust wrote:
On Thu, 2019-01-24 at 11:32 -0600, Jason L Tibbitts III wrote:
I could use some help figuring out the cause of some serious NFS client issues I'm having with the 4.20.3 kernel which I did not see under 4.19.15.
I have a network of about 130 desktops (plus a bunch of other machines, VMs and the like) running Fedora 29 connecting to six NFS servers running CentOS 7.6 (with the heavily patched vendor kernel 3.10.0-957.1.3). All machines involved are x86_64. We use kerberized NFS4 with generally sec=krb5i. The exports are generally made with "(rw,async,sec=krb5i:krb5p)".
Since I booted those clients into 4.20.3 I've started seeing processes getting stuck in the D state. The system itself will seem OK (except for the high load average) as long as I don't touch the hung NFS mount. Nothing was logged to dmesg or to the journal. So far booting back into the 4.19.15 kernel has cleared up the problem. I cannot yet reproduce this on demand; I've tried but it is probably related to some specific usage pattern.
Has anyone else seen issues like this? Can anyone help me to get more useful information that might point to the problem? I still haven't learned how to debug NFS issues properly. And if there's a stress test tool I could easily run that might help to reproduce the issue, I'd be happy to run it.
I note that 4.20.4 is out; I see one sunrpc fix which I guess could be related (sunrpc: handle ENOMEM in rpcb_getport_async) but the systems involved have plenty of free memory so I doubt that's it. I'll certainly try it anyway.
Various package versions: kernel-4.20.3-200.fc29.x86_64 (the problematic kernel) kernel-4.19.15-300.fc29.x86_64 (the functional kernel) nfs-utils-2.3.3-1.rc2.fc29.x86_64 gssproxy-0.8.0-6.fc29.x86_64 krb5-libs-1.16.1-25.fc29.i686
Thanks in advance for any help or advice,
- J<
Commit deaa5c96c2f7 ("SUNRPC: Address Kerberos performance/behavior regression") was supposed to be marked for stable as a fix. Chuck & Anna?
Looks like I missed that, sorry!
Stable folks, can you please backport deaa5c96c2f7 ("SUNRPC: Address Kerberos performance/behavior regression") to v4.20?
Thanks, Anna
-- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@hammerspace.com
On Fri, Jan 25, 2019 at 07:13:27PM +0000, Schumaker, Anna wrote:
On Thu, 2019-01-24 at 19:58 +0000, Trond Myklebust wrote:
On Thu, 2019-01-24 at 11:32 -0600, Jason L Tibbitts III wrote:
I could use some help figuring out the cause of some serious NFS client issues I'm having with the 4.20.3 kernel which I did not see under 4.19.15.
I have a network of about 130 desktops (plus a bunch of other machines, VMs and the like) running Fedora 29 connecting to six NFS servers running CentOS 7.6 (with the heavily patched vendor kernel 3.10.0-957.1.3). All machines involved are x86_64. We use kerberized NFS4 with generally sec=krb5i. The exports are generally made with "(rw,async,sec=krb5i:krb5p)".
Since I booted those clients into 4.20.3 I've started seeing processes getting stuck in the D state. The system itself will seem OK (except for the high load average) as long as I don't touch the hung NFS mount. Nothing was logged to dmesg or to the journal. So far booting back into the 4.19.15 kernel has cleared up the problem. I cannot yet reproduce this on demand; I've tried but it is probably related to some specific usage pattern.
Has anyone else seen issues like this? Can anyone help me to get more useful information that might point to the problem? I still haven't learned how to debug NFS issues properly. And if there's a stress test tool I could easily run that might help to reproduce the issue, I'd be happy to run it.
I note that 4.20.4 is out; I see one sunrpc fix which I guess could be related (sunrpc: handle ENOMEM in rpcb_getport_async) but the systems involved have plenty of free memory so I doubt that's it. I'll certainly try it anyway.
Various package versions: kernel-4.20.3-200.fc29.x86_64 (the problematic kernel) kernel-4.19.15-300.fc29.x86_64 (the functional kernel) nfs-utils-2.3.3-1.rc2.fc29.x86_64 gssproxy-0.8.0-6.fc29.x86_64 krb5-libs-1.16.1-25.fc29.i686
Thanks in advance for any help or advice,
- J<
Commit deaa5c96c2f7 ("SUNRPC: Address Kerberos performance/behavior regression") was supposed to be marked for stable as a fix. Chuck & Anna?
Looks like I missed that, sorry!
Stable folks, can you please backport deaa5c96c2f7 ("SUNRPC: Address Kerberos performance/behavior regression") to v4.20?
Queued for 4.20, thank you.
-- Thanks, Sasha
linux-stable-mirror@lists.linaro.org