From: André Draszik andre.draszik@linaro.org
This reverts commit 3066ff93476c35679cb07a97cce37d9bb07632ff.
This patch breaks all existing userspace by requiring updates as mentioned in the commit message, which is not allowed.
Revert to restore compatibility with existing userspace implementations.
Cc: Bernd Schubert bschubert@ddn.com Cc: Miklos Szeredi mszeredi@redhat.com Cc: stable@vger.kernel.org Signed-off-by: André Draszik andre.draszik@linaro.org
--- resend because of missing people in Cc --- fs/fuse/inode.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-)
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c index 549358ffea8b..0b966b0e0962 100644 --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -1132,10 +1132,7 @@ static void process_init_reply(struct fuse_mount *fm, struct fuse_args *args, process_init_limits(fc, arg);
if (arg->minor >= 6) { - u64 flags = arg->flags; - - if (flags & FUSE_INIT_EXT) - flags |= (u64) arg->flags2 << 32; + u64 flags = arg->flags | (u64) arg->flags2 << 32;
ra_pages = arg->max_readahead / PAGE_SIZE; if (flags & FUSE_ASYNC_READ)
On Mon, 4 Sept 2023 at 15:34, André Draszik git@andred.net wrote:
From: André Draszik andre.draszik@linaro.org
This reverts commit 3066ff93476c35679cb07a97cce37d9bb07632ff.
This patch breaks all existing userspace by requiring updates as mentioned in the commit message, which is not allowed.
It might break something, but you need to tell us what that is, please.
Thanks, Miklos
On Mon, 2023-09-04 at 15:41 +0200, Miklos Szeredi wrote:
On Mon, 4 Sept 2023 at 15:34, André Draszik git@andred.net wrote:
From: André Draszik andre.draszik@linaro.org
This reverts commit 3066ff93476c35679cb07a97cce37d9bb07632ff.
This patch breaks all existing userspace by requiring updates as mentioned in the commit message, which is not allowed.
It might break something, but you need to tell us what that is, please.
In my case, it's Android.
More generally this breaks all user-spaces that haven't been updated. Not breaking user-space is one of the top rules the kernel has, if not the topmost.
Cheers, Andre
On 9/4/23 16:21, André Draszik wrote:
On Mon, 2023-09-04 at 15:41 +0200, Miklos Szeredi wrote:
On Mon, 4 Sept 2023 at 15:34, André Draszik git@andred.net wrote:
From: André Draszik andre.draszik@linaro.org
This reverts commit 3066ff93476c35679cb07a97cce37d9bb07632ff.
This patch breaks all existing userspace by requiring updates as mentioned in the commit message, which is not allowed.
It might break something, but you need to tell us what that is, please.
In my case, it's Android.
More generally this breaks all user-spaces that haven't been updated. Not breaking user-space is one of the top rules the kernel has, if not the topmost.
Hmm, I guess Android is using one of the extended flags in the mean time. Do you have more data what exactly fails? I had posted this patch last year, when it was still rather early introduction of FUSE_INIT_EXT, hoping there was nothing in production yet using these flags. But virtiofsd was already using it, so the patch got delayed (I had actually assumed it would just get dropped).
Sorry for the trouble!
Bernd
From: André Draszik andre.draszik@linaro.org
This reverts commit 3066ff93476c35679cb07a97cce37d9bb07632ff.
This patch breaks all existing userspace by requiring updates as mentioned in the commit message, which is not allowed.
Revert to restore compatibility with existing userspace implementations.
Cc: Bernd Schubert bschubert@ddn.com Cc: Miklos Szeredi mszeredi@redhat.com Cc: stable@vger.kernel.org Signed-off-by: André Draszik andre.draszik@linaro.org Acked-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- v2: ping & add ack v1: resend because of missing people in Cc --- fs/fuse/inode.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-)
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c index 2e4eb7cf26fb..b21ccc85c47b 100644 --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -1154,10 +1154,7 @@ static void process_init_reply(struct fuse_mount *fm, struct fuse_args *args, process_init_limits(fc, arg);
if (arg->minor >= 6) { - u64 flags = arg->flags; - - if (flags & FUSE_INIT_EXT) - flags |= (u64) arg->flags2 << 32; + u64 flags = arg->flags | (u64) arg->flags2 << 32;
ra_pages = arg->max_readahead / PAGE_SIZE; if (flags & FUSE_ASYNC_READ)
On 10/18/23 13:15, André Draszik wrote:
From: André Draszik andre.draszik@linaro.org
This reverts commit 3066ff93476c35679cb07a97cce37d9bb07632ff.
This patch breaks all existing userspace by requiring updates as mentioned in the commit message, which is not allowed.
Revert to restore compatibility with existing userspace implementations.
Which fuse file system does it exactly break? In fact there haven't been added too many flags after - what exactly is broken?
Thanks, Bernd
On Wed, 2023-10-18 at 11:39 +0000, Bernd Schubert wrote:
On 10/18/23 13:15, André Draszik wrote:
From: André Draszik andre.draszik@linaro.org
This reverts commit 3066ff93476c35679cb07a97cce37d9bb07632ff.
This patch breaks all existing userspace by requiring updates as mentioned in the commit message, which is not allowed.
Revert to restore compatibility with existing userspace implementations.
Which fuse file system does it exactly break? In fact there haven't been added too many flags after - what exactly is broken?
The original patch broke the existing kernel <-> user ABI by now requiring user space applications to pass in an extra flag. There are various side-effects of this, like unbootable systems, just because the kernel was updated. Breaking the ABI is the one thing that is not allowed. This is not specific to any particular fuse file system.
Kind Regards, Andre
On 10/18/23 13:46, André Draszik wrote:
On Wed, 2023-10-18 at 11:39 +0000, Bernd Schubert wrote:
On 10/18/23 13:15, André Draszik wrote:
From: André Draszik andre.draszik@linaro.org
This reverts commit 3066ff93476c35679cb07a97cce37d9bb07632ff.
This patch breaks all existing userspace by requiring updates as mentioned in the commit message, which is not allowed.
Revert to restore compatibility with existing userspace implementations.
Which fuse file system does it exactly break? In fact there haven't been added too many flags after - what exactly is broken?
The original patch broke the existing kernel <-> user ABI by now requiring user space applications to pass in an extra flag. There are various side-effects of this, like unbootable systems, just because the kernel was updated. Breaking the ABI is the one thing that is not allowed. This is not specific to any particular fuse file system.
How exactly did it break it? These are feature flags - is there really a file system that relies on these flag to the extend that it does not work anymore?
Also, reverting this patch has the side effect that you can ask the kernel to use initialized bits - which obviously has other side effects.
Thanks, Bernd
On Wed, 2023-10-18 at 11:52 +0000, Bernd Schubert wrote:
On 10/18/23 13:46, André Draszik wrote:
On Wed, 2023-10-18 at 11:39 +0000, Bernd Schubert wrote:
On 10/18/23 13:15, André Draszik wrote:
From: André Draszik andre.draszik@linaro.org
This reverts commit 3066ff93476c35679cb07a97cce37d9bb07632ff.
This patch breaks all existing userspace by requiring updates as mentioned in the commit message, which is not allowed.
Revert to restore compatibility with existing userspace implementations.
Which fuse file system does it exactly break? In fact there haven't been added too many flags after - what exactly is broken?
The original patch broke the existing kernel <-> user ABI by now requiring user space applications to pass in an extra flag. There are various side-effects of this, like unbootable systems, just because the kernel was updated. Breaking the ABI is the one thing that is not allowed. This is not specific to any particular fuse file system.
How exactly did it break it?
At least in Android, creating new files, or reading existing files returns -EFAULT
These are feature flags - is there really a file system that relies on these flag to the extend that it does not work anymore?
I don't know enough about the implementation details, but even outside Android user space had to be updated as a prerequisite for this kernel patch: https://lore.kernel.org/all/YmUKZQKNAGimupv7@redhat.com/ https://github.com/libfuse/libfuse/pull/662
Which means any non-Android user space predating those changes isn't working anymore either.
Cheers, Andre
On 10/18/23 16:26, André Draszik wrote:
On Wed, 2023-10-18 at 11:52 +0000, Bernd Schubert wrote:
On 10/18/23 13:46, André Draszik wrote:
On Wed, 2023-10-18 at 11:39 +0000, Bernd Schubert wrote:
On 10/18/23 13:15, André Draszik wrote:
From: André Draszik andre.draszik@linaro.org
This reverts commit 3066ff93476c35679cb07a97cce37d9bb07632ff.
This patch breaks all existing userspace by requiring updates as mentioned in the commit message, which is not allowed.
Revert to restore compatibility with existing userspace implementations.
Which fuse file system does it exactly break? In fact there haven't been added too many flags after - what exactly is broken?
The original patch broke the existing kernel <-> user ABI by now requiring user space applications to pass in an extra flag. There are various side-effects of this, like unbootable systems, just because the kernel was updated. Breaking the ABI is the one thing that is not allowed. This is not specific to any particular fuse file system.
How exactly did it break it?
At least in Android, creating new files, or reading existing files returns -EFAULT
Hmm, could you please point me to the corresponding android userspace library? I guess it is not using libfuse? At least I would like to understand the issue...
These are feature flags - is there really a file system that relies on these flag to the extend that it does not work anymore?
I don't know enough about the implementation details, but even outside Android user space had to be updated as a prerequisite for this kernel patch: https://lore.kernel.org/all/YmUKZQKNAGimupv7@redhat.com/ https://github.com/libfuse/libfuse/pull/662
Which means any non-Android user space predating those changes isn't working anymore either.
The patch in libfuse is from me, there was nothing broken. And I don't think that any of the additional flags added are a _requirement_ for libfuse file systems to work. I'm not sure if DAX and the other flags before the patch was merged are a _requirement_ for virtiofsd or just a nice feature to have...
In anyway, please still consider that using possibly uninitialized flags is not a good idea either and could randomly break things as well.
Thanks, Bernd
On 10/18/23 16:40, Bernd Schubert wrote:
On 10/18/23 16:26, André Draszik wrote:
On Wed, 2023-10-18 at 11:52 +0000, Bernd Schubert wrote:
On 10/18/23 13:46, André Draszik wrote:
On Wed, 2023-10-18 at 11:39 +0000, Bernd Schubert wrote:
On 10/18/23 13:15, André Draszik wrote:
From: André Draszik andre.draszik@linaro.org
This reverts commit 3066ff93476c35679cb07a97cce37d9bb07632ff.
This patch breaks all existing userspace by requiring updates as mentioned in the commit message, which is not allowed.
Revert to restore compatibility with existing userspace implementations.
Which fuse file system does it exactly break? In fact there haven't been added too many flags after - what exactly is broken?
The original patch broke the existing kernel <-> user ABI by now requiring user space applications to pass in an extra flag. There are various side-effects of this, like unbootable systems, just because the kernel was updated. Breaking the ABI is the one thing that is not allowed. This is not specific to any particular fuse file system.
How exactly did it break it?
At least in Android, creating new files, or reading existing files returns -EFAULT
Hmm, could you please point me to the corresponding android userspace library? I guess it is not using libfuse? At least I would like to understand the issue...
These are feature flags - is there really a file system that relies on these flag to the extend that it does not work anymore?
I don't know enough about the implementation details, but even outside Android user space had to be updated as a prerequisite for this kernel patch: https://lore.kernel.org/all/YmUKZQKNAGimupv7@redhat.com/ https://github.com/libfuse/libfuse/pull/662
Which means any non-Android user space predating those changes isn't working anymore either.
The patch in libfuse is from me, there was nothing broken. And I don't think that any of the additional flags added are a _requirement_ for libfuse file systems to work. I'm not sure if DAX and the other flags before the patch was merged are a _requirement_ for virtiofsd or just a nice feature to have...
Looking at the android kernel source:
/* * For FUSE < 7.36 FUSE_PASSTHROUGH has value (1 << 31). * This condition check is not really required, but would prevent having a * broken commit in the tree. */ #if FUSE_KERNEL_VERSION > 7 || \ (FUSE_KERNEL_VERSION == 7 && FUSE_KERNEL_MINOR_VERSION >= 36) #define FUSE_PASSTHROUGH (1ULL << 63) #else #define FUSE_PASSTHROUGH (1 << 31) #endif
So passthrough gets broken with this check and android heavily uses that. Would be interesting to know if this could result in EFAULT.
Thanks, Bernd
Hi, Thorsten here, the Linux kernel's regression tracker. Top-posting for once, to make this easily accessible to everyone.
Miklos, I'm wondering what the status here is. The description in the reverts André sent[1] are maybe a bit vague[2], but it sounds a lot like he ran into a big regression that should be addressed somehow -- maybe with a revert. But it seems we haven't got any closer to that in all those ~7 weeks since the first revert was posted. But I might be missing something, hence a quick evaluation from your side would help me a lot here to understand the situation.
[1] https://lore.kernel.org/lkml/20230904133321.104584-1-git@andred.net/ https://lore.kernel.org/lkml/20231018111508.3913860-1-git@andred.net/
[2] Does this happen on all Android versions or just some? And what is actually breaking (this was answered somewhere in the thread iirc)?
Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) -- Everything you wanna know about Linux kernel regression tracking: https://linux-regtracking.leemhuis.info/about/#tldr If I did something stupid, please tell me, as explained on that page.
On 18.10.23 17:51, Bernd Schubert wrote:
On 10/18/23 16:40, Bernd Schubert wrote:
On 10/18/23 16:26, André Draszik wrote:
On Wed, 2023-10-18 at 11:52 +0000, Bernd Schubert wrote:
On 10/18/23 13:46, André Draszik wrote:
On Wed, 2023-10-18 at 11:39 +0000, Bernd Schubert wrote:
On 10/18/23 13:15, André Draszik wrote: > From: André Draszik andre.draszik@linaro.org > > This reverts commit 3066ff93476c35679cb07a97cce37d9bb07632ff. > > This patch breaks all existing userspace by requiring updates > as > mentioned in the commit message, which is not allowed. > > Revert to restore compatibility with existing userspace > implementations.
Which fuse file system does it exactly break? In fact there haven't been added too many flags after - what exactly is broken?
The original patch broke the existing kernel <-> user ABI by now requiring user space applications to pass in an extra flag. There are various side-effects of this, like unbootable systems, just because the kernel was updated. Breaking the ABI is the one thing that is not allowed. This is not specific to any particular fuse file system.
How exactly did it break it?
At least in Android, creating new files, or reading existing files returns -EFAULT
Hmm, could you please point me to the corresponding android userspace library? I guess it is not using libfuse? At least I would like to understand the issue...
These are feature flags - is there really a file system that relies on these flag to the extend that it does not work anymore?
I don't know enough about the implementation details, but even outside Android user space had to be updated as a prerequisite for this kernel patch: https://lore.kernel.org/all/YmUKZQKNAGimupv7@redhat.com/ https://github.com/libfuse/libfuse/pull/662
Which means any non-Android user space predating those changes isn't working anymore either.
The patch in libfuse is from me, there was nothing broken. And I don't think that any of the additional flags added are a _requirement_ for libfuse file systems to work. I'm not sure if DAX and the other flags before the patch was merged are a _requirement_ for virtiofsd or just a nice feature to have...
Looking at the android kernel source:
/*
- For FUSE < 7.36 FUSE_PASSTHROUGH has value (1 << 31).
- This condition check is not really required, but would prevent having a
- broken commit in the tree.
*/ #if FUSE_KERNEL_VERSION > 7 || \ (FUSE_KERNEL_VERSION == 7 && FUSE_KERNEL_MINOR_VERSION >= 36) #define FUSE_PASSTHROUGH (1ULL << 63) #else #define FUSE_PASSTHROUGH (1 << 31) #endif
So passthrough gets broken with this check and android heavily uses that. Would be interesting to know if this could result in EFAULT.
Thanks, Bernd
On Wed, Oct 25, 2023 at 1:30 PM Linux regression tracking (Thorsten Leemhuis) regressions@leemhuis.info wrote:
Miklos, I'm wondering what the status here is. The description in the reverts André sent[1] are maybe a bit vague[2], but it sounds a lot like he ran into a big regression that should be addressed somehow -- maybe with a revert. But it seems we haven't got any closer to that in all those ~7 weeks since the first revert was posted. But I might be missing something, hence a quick evaluation from your side would help me a lot here to understand the situation.
I don't think the Android use case counts as a regression.
If they'd use an unmodified upstream kernel, it would be a different case.
But they modify the kernel heavily, and AFAICS this breakage is related to such a modification (as pointed out by Bernd upthread).
André might want to clarify, but I've not seen any concrete real world examples of regressions caused by this change outside of Android.
Thanks, Miklos
On 25.10.23 15:17, Miklos Szeredi wrote:
On Wed, Oct 25, 2023 at 1:30 PM Linux regression tracking (Thorsten Leemhuis) regressions@leemhuis.info wrote:
Miklos, I'm wondering what the status here is. The description in the reverts André sent[1] are maybe a bit vague[2], but it sounds a lot like he ran into a big regression that should be addressed somehow -- maybe with a revert. But it seems we haven't got any closer to that in all those ~7 weeks since the first revert was posted. But I might be missing something, hence a quick evaluation from your side would help me a lot here to understand the situation.
First, many thx for the reply.
I don't think the Android use case counts as a regression.
If they'd use an unmodified upstream kernel, it would be a different case.
But they modify the kernel heavily, and AFAICS this breakage is related to such a modification (as pointed out by Bernd upthread).
Not sure who you mean with "they" here.
Isn't the main question if André used a vanilla kernel beforehand on those Android devices and now is unable to do so? André, is that the case? Or did you only encounter this regression when switching from a patched kernel to a vanilla kernel?
Also: André, do you see this in some test env, or in some real use case where others might also run into the problem?
André might want to clarify, but I've not seen any concrete real world examples of regressions caused by this change outside of Android.
Yeah, some clarification from André really would be helpful.
Thx again for the answer.
Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) -- Everything you wanna know about Linux kernel regression tracking: https://linux-regtracking.leemhuis.info/about/#tldr If I did something stupid, please tell me, as explained on that page.
On Wed, Oct 25, 2023 at 03:17:09PM +0200, Miklos Szeredi wrote:
On Wed, Oct 25, 2023 at 1:30 PM Linux regression tracking (Thorsten Leemhuis) regressions@leemhuis.info wrote:
Miklos, I'm wondering what the status here is. The description in the reverts André sent[1] are maybe a bit vague[2], but it sounds a lot like he ran into a big regression that should be addressed somehow -- maybe with a revert. But it seems we haven't got any closer to that in all those ~7 weeks since the first revert was posted. But I might be missing something, hence a quick evaluation from your side would help me a lot here to understand the situation.
I don't think the Android use case counts as a regression.
Why not? In the changelog for this commit, it says:
There is a risk with this change, though - it might break existing user space libraries, which are already using flags2 without setting FUSE_INIT_EXT.
And that's exactly what Android was doing. Not all the world uses libfuse, unfortunatly.
Yes, Android did have an out-of-tree change to support a fuse extension that is not accepted upstream yet (but I think they submitted it already), and they had to figure out the "safest" way to do so to keep compability with everything else.
Now yes, that attempt failed, and now older Android userspace breaks with newer kernels because of this commit, which you all even agreed might happen here!
So either you have a policy of "we only care about libfuse use cases for this api", or you don't, which is fine, just say so. But that's not what the changelog says.
If they'd use an unmodified upstream kernel, it would be a different case.
But they modify the kernel heavily, and AFAICS this breakage is related to such a modification (as pointed out by Bernd upthread).
They add a new fuse extension, yes. How do you suggest they do so in an abi-safe way for the future when features are not accepted by upstream?
André might want to clarify, but I've not seen any concrete real world examples of regressions caused by this change outside of Android.
Android is _only_ a few billion devices, it doesn't get much more "real world" than that. All other Linux instances are just a rounding error :)
thanks,
gre gk-h
On Fri, Oct 27, 2023 at 12:40 PM Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
On Wed, Oct 25, 2023 at 03:17:09PM +0200, Miklos Szeredi wrote:
I don't think the Android use case counts as a regression.
Why not? In the changelog for this commit, it says:
There is a risk with this change, though - it might break existing user space libraries, which are already using flags2 without setting FUSE_INIT_EXT.
And that's exactly what Android was doing. Not all the world uses libfuse, unfortunatly.
No, this is not about libfuse or not libfuse. It's about upstream or downstream. If upstream maintainers would need to care about downstream regressions, then it would be hell.
How should Android handle this? Here's how: they have an internal patch, which conflicts with the patch they want to revert. Well, let them revert that patch in their kernel. It's not like it's a big maintenance burden, since it's just a few lines. This is the sort of thing that downstream maintainers do all the time.
It's a no-brainer, what are we talking about then?
Thanks, Miklos
On Fri, Oct 27, 2023 at 02:36:55PM +0200, Miklos Szeredi wrote:
On Fri, Oct 27, 2023 at 12:40 PM Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
On Wed, Oct 25, 2023 at 03:17:09PM +0200, Miklos Szeredi wrote:
I don't think the Android use case counts as a regression.
Why not? In the changelog for this commit, it says:
There is a risk with this change, though - it might break existing user space libraries, which are already using flags2 without setting FUSE_INIT_EXT.
And that's exactly what Android was doing. Not all the world uses libfuse, unfortunatly.
No, this is not about libfuse or not libfuse. It's about upstream or downstream. If upstream maintainers would need to care about downstream regressions, then it would be hell.
I agree, that's not what I'm saying here.
How should Android handle this? Here's how: they have an internal patch, which conflicts with the patch they want to revert. Well, let them revert that patch in their kernel. It's not like it's a big maintenance burden, since it's just a few lines. This is the sort of thing that downstream maintainers do all the time.
It's a no-brainer, what are we talking about then?
I'm talking about a patch where you are changing the existing user/kernel api by filtering out values that you previously accepted. And it was done in a patch saying "this might break userspace", and guess what, it did!
So why not revert it as obviously you all anticipated that this might happen?
The "internal" patch from Android was just using the upper values of the fuse api because they didn't want to conflict with the upstream values before their code was accepted (and it was submitted already, but not accepted.)
So how do you want developers to work on changes before they are accepted with this user/kernel numbering scheme that you have? You just broke anyone who was using a not-accepted-in-the-tree value, right?
thanks,
greg k-h
On Fri, Oct 27, 2023 at 2:46 PM Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
I'm talking about a patch where you are changing the existing user/kernel api by filtering out values that you previously accepted. And it was done in a patch saying "this might break userspace", and guess what, it did!
So why not revert it as obviously you all anticipated that this might happen?
Because it's a useful patch, and while I mentioned the possibility of a regression, I definitely didn't expect it to happen.
And I still think that the Android case doesn't count, because it's just a completely different environment. What can happen on Android may not happen on non-Android and vice versa. Why should I revert a useful patch, because it causes a regression in a downstream kernel, because of an Android only patch?
The "internal" patch from Android was just using the upper values of the fuse api because they didn't want to conflict with the upstream values before their code was accepted (and it was submitted already, but not accepted.)
So how do you want developers to work on changes before they are accepted with this user/kernel numbering scheme that you have? You just broke anyone who was using a not-accepted-in-the-tree value, right?
Again, upstream and downstream. There's a reason why some companies have upstream first policies: because it's less painful in the long run. Android having decided to go ahead and add that patch is not my problem, and I really really don't want to care.
Having said all that, if there's a regression that someone reports for upstream flags (even on a vendor kernel), I'll just revert the patch right away.
Thanks, Miklos
On Fri, Oct 27, 2023 at 03:03:28PM +0200, Miklos Szeredi wrote:
On Fri, Oct 27, 2023 at 2:46 PM Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
I'm talking about a patch where you are changing the existing user/kernel api by filtering out values that you previously accepted. And it was done in a patch saying "this might break userspace", and guess what, it did!
So why not revert it as obviously you all anticipated that this might happen?
Because it's a useful patch, and while I mentioned the possibility of a regression, I definitely didn't expect it to happen.
But it did :(
And I still think that the Android case doesn't count, because it's just a completely different environment. What can happen on Android may not happen on non-Android and vice versa. Why should I revert a useful patch, because it causes a regression in a downstream kernel, because of an Android only patch?
It's not all that different of an environment, they use a stock kernel, you can boot an android device just fine for many years without any changes.
I would argue there are less changes in an android kernel than an "enterprise" linux distro kernel these days by far :)
The "internal" patch from Android was just using the upper values of the fuse api because they didn't want to conflict with the upstream values before their code was accepted (and it was submitted already, but not accepted.)
So how do you want developers to work on changes before they are accepted with this user/kernel numbering scheme that you have? You just broke anyone who was using a not-accepted-in-the-tree value, right?
Again, upstream and downstream. There's a reason why some companies have upstream first policies: because it's less painful in the long run. Android having decided to go ahead and add that patch is not my problem, and I really really don't want to care.
I think you rejected Android's changes, so what were they supposed to do? Or someone did, I can't remember when it was submitted, but i do remember seeing the patches flow by on some list...
Having said all that, if there's a regression that someone reports for upstream flags (even on a vendor kernel), I'll just revert the patch right away.
So because Android userspace is sending a flag value that is not in the upstream table, this breakage is ok? Or do you mean something else, I'm getting confused.
thanks,
greg k-h
On Fri, Oct 27, 2023 at 3:12 PM Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
So because Android userspace is sending a flag value that is not in the upstream table, this breakage is ok? Or do you mean something else, I'm getting confused.
From my POV the regression in the Android kernel was due to the Android patch that added those flags.
Not all flags are equal, some applications use a specific set of flags and another set of applications use another set. Non-Android apps won't use the flag that Android added, for obvious reasons.
I still don't see why we'd need to revert this patch due to regressions in Android. Maybe I'm really dumb, but I just don't get it.
Thanks, Miklos
On Fri, 2023-10-27 at 15:03 +0200, Miklos Szeredi wrote:
Again, upstream and downstream. There's a reason why some companies have upstream first policies: because it's less painful in the long run. Android having decided to go ahead and add that patch is not my problem, and I really really don't want to care.
Having said all that, if there's a regression that someone reports for upstream flags (even on a vendor kernel), I'll just revert the patch right away.
The patch in question has broken all users that use the higher flags and that don't use your version of libfuse, not just Android. You're filtering them out now when you didn't at the time that those ('official) high flags were added. There are a couple more high flags than just the one that Android added.
Cheers, Andre'
On Fri, Oct 27, 2023 at 3:14 PM André Draszik andre.draszik@linaro.org wrote:
The patch in question has broken all users that use the higher flags and that don't use your version of libfuse, not just Android. You're filtering them out now when you didn't at the time that those ('official) high flags were added. There are a couple more high flags than just the one that Android added.
Okay. Where are all those users? Why haven't they reported this?
Thanks, Miklos
On Fri, 2023-10-27 at 15:24 +0200, Miklos Szeredi wrote:
On Fri, Oct 27, 2023 at 3:14 PM André Draszik andre.draszik@linaro.org wrote:
The patch in question has broken all users that use the higher flags and that don't use your version of libfuse, not just Android. You're filtering them out now when you didn't at the time that those ('official) high flags were added. There are a couple more high flags than just the one that Android added.
Okay. Where are all those users?
That's not the point. The point is the kernel<->user API has rendered them too non-working.
Why haven't they reported this?
Again, not the point. If I was to ask my crystal ball, Android is trying to track the upstream kernel closely, others might not and might not have bumped into this issue yet. Still, not the point.
Cheers, A.
On Fri, Oct 27, 2023 at 3:39 PM André Draszik andre.draszik@linaro.org wrote:
On Fri, 2023-10-27 at 15:24 +0200, Miklos Szeredi wrote:
On Fri, Oct 27, 2023 at 3:14 PM André Draszik andre.draszik@linaro.org wrote:
The patch in question has broken all users that use the higher flags and that don't use your version of libfuse, not just Android. You're filtering them out now when you didn't at the time that those ('official) high flags were added. There are a couple more high flags than just the one that Android added.
Okay. Where are all those users?
That's not the point. The point is the kernel<->user API has rendered them too non-working.
It is a very important point. A theoretical bug isn't a regression. Nor is a broken test case BTW.
Please read section 'What is a "regression" and what is the "no regressions rule"?' in Documentation/admin-guide/reporting-regressions.rst.
Thanks, Miklos
On 27.10.23 16:05, Miklos Szeredi wrote:
On Fri, Oct 27, 2023 at 3:39 PM André Draszik andre.draszik@linaro.org wrote:
On Fri, 2023-10-27 at 15:24 +0200, Miklos Szeredi wrote:
On Fri, Oct 27, 2023 at 3:14 PM André Draszik andre.draszik@linaro.org wrote:
The patch in question has broken all users that use the higher flags and that don't use your version of libfuse, not just Android. You're filtering them out now when you didn't at the time that those ('official) high flags were added. There are a couple more high flags than just the one that Android added.
Okay. Where are all those users?
That's not the point. The point is the kernel<->user API has rendered them too non-working.
It is a very important point. A theoretical bug isn't a regression. Nor is a broken test case BTW.
Please read section 'What is a "regression" and what is the "no regressions rule"?' in Documentation/admin-guide/reporting-regressions.rst.
I'm taken a bit back and forth here and it seems we are stuck again. So let me try again to hopefully clear things up a bit:
André, could you please state
* What practical use-case actually stopped working?
* What Linux kernel version actually worked for your (because if thing broke when you upgraded from a vendor kernel to a vanilla kernel than this does not qualify as regression IMHO)
Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) -- Everything you wanna know about Linux kernel regression tracking: https://linux-regtracking.leemhuis.info/about/#tldr If I did something stupid, please tell me, as explained on that page.
Hi Thorsten,
(sorry for the slow reply)
On Wed, 2023-11-01 at 13:36 +0100, Linux regression tracking (Thorsten Leemhuis) wrote:
I'm taken a bit back and forth here and it seems we are stuck again. So let me try again to hopefully clear things up a bit:
André, could you please state
- What practical use-case actually stopped working?
It's impossible to use a newer kernel together with an older user space which is something that Android had been supporting for a long time.
- What Linux kernel version actually worked for your (because if
thing broke when you upgraded from a vendor kernel to a vanilla kernel than this does not qualify as regression IMHO)
We are using the Android kernel in all cases and Android applies patches on top of Linus' tree, yes (as does everybody else). The previous Android kernel worked, the current Android kernel doesn't because of the patch in question.
I think Greg made some valid points before: https://lore.kernel.org/all/2023102731-wobbly-glimpse-97f5@gregkh/
I'm talking about a patch where you are changing the existing user/kernel api by filtering out values that you previously accepted. And it was done in a patch saying "this might break userspace", and guess what, it did!
I guess it boils down to an an agreement regarding Greg's previous questions/points: https://lore.kernel.org/all/2023102757-cornflake-pry-e788@gregkh/
So because Android userspace is sending a flag value that is not in the upstream table, this breakage is ok?
and https://lore.kernel.org/all/2023102740-think-hatless-ab87@gregkh/
now older Android userspace breaks with newer kernels because of this commit, which you all even agreed might happen here!
So either you have a policy of "we only care about libfuse use cases for this api", or you don't, which is fine, just say so. But that's not what the changelog says.
But I agree, it seems we're stuck and I'm not sure how to resolve this either, Miklos has his points, Android has a different position.
Cheers, Andre'
On Wed, Nov 8, 2023 at 11:31 AM André Draszik andre.draszik@linaro.org wrote:
We are using the Android kernel in all cases and Android applies patches on top of Linus' tree, yes (as does everybody else). The previous Android kernel worked, the current Android kernel doesn't because of the patch in question.
Why don't you revert the patch in question in the Android kernel?
Thanks, Miklos
On 08.11.23 11:31, André Draszik wrote:
[...] But I agree, it seems we're stuck and I'm not sure how to resolve this either, Miklos has his points, Android has a different position.
FWIW, this thread died down without any agreement if this is an regression or not. Continuing to track it as one likely is not worth the effort, hence I'll remove it from the list of tracked issues. If anyone still thinks this is something that should be fixed I'd say that person should revive this thread and bring Linus in (but FWIW, I pointed him at this thread once already).
#regzbot inconclusive: people can't agree if this is a regression or not and it seems people stopped caring #regzbot ignore-activity
Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) -- Everything you wanna know about Linux kernel regression tracking: https://linux-regtracking.leemhuis.info/about/#tldr If I did something stupid, please tell me, as explained on that page.
On 10/27/23 14:46, Greg Kroah-Hartman wrote:
On Fri, Oct 27, 2023 at 02:36:55PM +0200, Miklos Szeredi wrote:
On Fri, Oct 27, 2023 at 12:40 PM Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
On Wed, Oct 25, 2023 at 03:17:09PM +0200, Miklos Szeredi wrote:
I don't think the Android use case counts as a regression.
Why not? In the changelog for this commit, it says:
There is a risk with this change, though - it might break existing user space libraries, which are already using flags2 without setting FUSE_INIT_EXT.
And that's exactly what Android was doing. Not all the world uses libfuse, unfortunatly.
No, this is not about libfuse or not libfuse. It's about upstream or downstream. If upstream maintainers would need to care about downstream regressions, then it would be hell.
I agree, that's not what I'm saying here.
How should Android handle this? Here's how: they have an internal patch, which conflicts with the patch they want to revert. Well, let them revert that patch in their kernel. It's not like it's a big maintenance burden, since it's just a few lines. This is the sort of thing that downstream maintainers do all the time.
It's a no-brainer, what are we talking about then?
I'm talking about a patch where you are changing the existing user/kernel api by filtering out values that you previously accepted. And it was done in a patch saying "this might break userspace", and guess what, it did!
So why not revert it as obviously you all anticipated that this might happen?
The "internal" patch from Android was just using the upper values of the fuse api because they didn't want to conflict with the upstream values before their code was accepted (and it was submitted already, but not accepted.)
So how do you want developers to work on changes before they are accepted with this user/kernel numbering scheme that you have? You just broke anyone who was using a not-accepted-in-the-tree value, right?
It is not related to accepted-in-tree, but server (userspace) side now has to set the flag "FUSE_INIT_EXT" itself, to tell the kernel it knows about these flags and that these bits are initialized. Not setting these flags would cause random issues with any fuse userspace/server side that does not have zeroed 'struct fuse_init_out' - from my point of view that is a regression as well. And In that sense the patch is a regression fix - which now has other side effects.
Regarding out-of-tree development, I'm just right now in the position to have to back port my own features (and while they are still in development and being pushed upstream) to older distribution kernels. This is one of the reasons why I currently look at each and every fuse patch that is posted to the list - I'm looking if it can cause issues for my company/myself and at the same time I can give a bit reviews and maybe help Miklos that way. (@Amir overlay patches are on my todo list). To be honest, given the rather huge Android overlay patch, I'm missing reviews from the Android team... Especially for Amir, who took over overlay work. And if Android team would have monitored and reviewed, they would have noticed possible issue beforehand.
Thanks, Bernd
[TLDR: I'm adding this report to the list of tracked Linux kernel regressions; the text you find below is based on a few templates paragraphs you might have encountered already in similar form. See link in footer if these mails annoy you.]
On 04.09.23 15:33, André Draszik wrote:
From: André Draszik andre.draszik@linaro.org
This reverts commit 3066ff93476c35679cb07a97cce37d9bb07632ff.
This patch breaks all existing userspace by requiring updates as mentioned in the commit message, which is not allowed.
Revert to restore compatibility with existing userspace implementations.
Thanks for the report. To be sure the issue doesn't fall through the cracks unnoticed, I'm adding it to regzbot, the Linux kernel regression tracking bot:
#regzbot ^introduced 3066ff93476c35679cb07a97cce37d9bb07632 #regzbot title fuse: creating new files, or reading existing files in i Android, now returns -EFAULT #regzbot ignore-activity
This isn't a regression? This issue or a fix for it are already discussed somewhere else? It was fixed already? You want to clarify when the regression started to happen? Or point out I got the title or something else totally wrong? Then just reply and tell me -- ideally while also telling regzbot about it, as explained by the page listed in the footer of this mail.
Developers: When fixing the issue, remember to add 'Link:' tags pointing to the report (the parent of this mail). See page linked in footer for details.
Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) -- Everything you wanna know about Linux kernel regression tracking: https://linux-regtracking.leemhuis.info/about/#tldr That page also explains what to do if mails like this annoy you.
linux-stable-mirror@lists.linaro.org