Hello,
I am writing to report an issue on a nfs mount that disappears due to an inode revalide failure (already sent in January but probably banned with html format...). This very old commit (https://github.com/torvalds/linux/commit/cc89684c9a265828ce061037f1f79f4a68c...) exactly show the problem I have and this old resolved issue (https://bugzilla.kernel.org/show_bug.cgi?id=117651) is probably failing again today
To sum up, I have a NFS mount inside another NFS mount (for example: /opt/nfs/mount1 & /opt/nfs/mount1/mount2). If I kill a task trying to get a file descriptor on /opt/nfs/mount1/mount2 then it will be unmounted. My simple test code to reproduce very easily:
int main(int argc, char *argv[]) { while (1) { close(open(argv[1], O_RDONLY)); } }
In logs, I have: "nfs_revalidate_inode: (0:62/845965) getattr failed, error=-512"
Tested on 5.19 and 6.1 kernel
Best regards, Sylvain Menu
On Thu, Mar 09, 2023 at 10:42:41AM +0100, Sylvain Menu wrote:
Hello,
I am writing to report an issue on a nfs mount that disappears due to an inode revalide failure (already sent in January but probably banned with html format...). This very old commit (https://github.com/torvalds/linux/commit/cc89684c9a265828ce061037f1f79f4a68c...) exactly show the problem I have and this old resolved issue (https://bugzilla.kernel.org/show_bug.cgi?id=117651) is probably failing again today
To sum up, I have a NFS mount inside another NFS mount (for example: /opt/nfs/mount1 & /opt/nfs/mount1/mount2). If I kill a task trying to get a file descriptor on /opt/nfs/mount1/mount2 then it will be unmounted. My simple test code to reproduce very easily:
int main(int argc, char *argv[]) { while (1) { close(open(argv[1], O_RDONLY)); } }
In logs, I have: "nfs_revalidate_inode: (0:62/845965) getattr failed, error=-512"
Tested on 5.19 and 6.1 kernel
So is this a regression or something that has always been present?
thanks,
greg k-h
I think it's a regression according to the old resolved bugs/tickets but no idea since when it's broken again
Le jeu. 9 mars 2023 à 11:07, Greg KH gregkh@linuxfoundation.org a écrit :
On Thu, Mar 09, 2023 at 10:42:41AM +0100, Sylvain Menu wrote:
Hello,
I am writing to report an issue on a nfs mount that disappears due to an inode revalide failure (already sent in January but probably banned with html format...). This very old commit (https://github.com/torvalds/linux/commit/cc89684c9a265828ce061037f1f79f4a68c...) exactly show the problem I have and this old resolved issue (https://bugzilla.kernel.org/show_bug.cgi?id=117651) is probably failing again today
To sum up, I have a NFS mount inside another NFS mount (for example: /opt/nfs/mount1 & /opt/nfs/mount1/mount2). If I kill a task trying to get a file descriptor on /opt/nfs/mount1/mount2 then it will be unmounted. My simple test code to reproduce very easily:
int main(int argc, char *argv[]) { while (1) { close(open(argv[1], O_RDONLY)); } }
In logs, I have: "nfs_revalidate_inode: (0:62/845965) getattr failed, error=-512"
Tested on 5.19 and 6.1 kernel
So is this a regression or something that has always been present?
thanks,
greg k-h
No I don't have that, I found the bug in production by no chance. I tried to dive into the code but it quickly becomes complex for me, at least it's easy to reproduce with a little script (while(1) timeout my_c.code)
thanks sylvain menu
Le jeu. 9 mars 2023 à 11:22, Greg KH gregkh@linuxfoundation.org a écrit :
On Thu, Mar 09, 2023 at 11:17:30AM +0100, Sylvain Menu wrote:
I think it's a regression according to the old resolved bugs/tickets but no idea since when it's broken again
Any chance you can do 'git bisect' to find where it broke and what commit broke it?
thanks,
greg k-h
On Fri, 10 Mar 2023, Sylvain Menu wrote:
No I don't have that, I found the bug in production by no chance. I tried to dive into the code but it quickly becomes complex for me, at least it's easy to reproduce with a little script (while(1) timeout my_c.code)
thanks sylvain menu
Le jeu. 9 mars 2023 à 11:22, Greg KH gregkh@linuxfoundation.org a écrit :
On Thu, Mar 09, 2023 at 11:17:30AM +0100, Sylvain Menu wrote:
I think it's a regression according to the old resolved bugs/tickets but no idea since when it's broken again
Any chance you can do 'git bisect' to find where it broke and what commit broke it?
Please see https://lore.kernel.org/linux-nfs/87361aovm3.fsf@notabene.neil.brown.name/
I posted a patch for this a couple of years ago, but Trond wouldn't take it.
NeilBrown
thanks,
greg k-h
linux-stable-mirror@lists.linaro.org