On Tue, Mar 13, 2018 at 11:50 PM, Dave Chinner david@fromorbit.com wrote:
On Tue, Mar 13, 2018 at 04:33:15PM +0200, Amir Goldstein wrote:
On Tue, Mar 13, 2018 at 3:11 PM, Christoph Hellwig hch@lst.de wrote:
On Tue, Mar 13, 2018 at 02:46:09PM +0200, Amir Goldstein wrote:
OK, found the patches the fix soft lockups in generic/269 and assertion in generic/232, so expunging those 2 tests from v4.15.y test runs.
Which patches are those? We should probably backport them to 4.15-stable.
Probably, but I guess Darrick has those in his TODO.
There is this series that refers to failure in generic/232: https://marc.info/?l=linux-xfs&m=151701545720824&w=2
These 2 commits refer to generic/269 specifically in commit message: 70c57dcd606f xfs: skip CoW writes past EOF when writeback races with truncate be78ff0e7277 xfs: recheck reflink / dirty page status before freeing CoW reservations and the thread on the second commit also mentions generic/270 (I found out the hard way that it also soft locks).
But there are surely more patches for stable in master. I recon CC: stable and/or Fixes: tags could have been helpful, but I don't see any of those in v4.16-rcX from the core xfs developers.
AS I always say: if you want to maintain a stable backport kernel with all the fixes that go into the bleeding edge, you're more than welcome to do it.
Everyone else is flat out just keeping up with on going development and fixing bugs in the kernel as it's moving forward. So if you have the need for stable backports, please keep backporting patches you need, testing them and asking the stable maintainers to include them.
Greg,
I tested the patch in question per Darrick's request. I found no regressions with full "auto" run on xfs with reflinks enabled. Please include this patch in stable 4.15.
Dave,
It is often the case, though maybe not always, that the author of a patch has the knowledge of the 'Fixes' commit and/or the stable kernel version patch is relevant to or would easily apply to. It is therefore a relatively low effort for a developer to include this information as courtesy to stable maintainers, whether they are maintaining kernel.org stable kernels or distro stable kernels.
That's just my opinion.
Christoph/Darrick,
FYI, with stable kernel 4.15.y, I found the following failures with -g auto:
Assert (mostly on quota related): generic/232 xfs/222 xfs/305 xfs/440 xfs/442
Soft lockup (likely fixed by be78ff0e7277): generic/269 generic/270 xfs/442
Failures (output mismatch): xfs/170 xfs/191-input-validation xfs/348
Thanks, Amir.
On Wed, Mar 14, 2018 at 08:24:30AM +0200, Amir Goldstein wrote:
On Tue, Mar 13, 2018 at 11:50 PM, Dave Chinner david@fromorbit.com wrote:
AS I always say: if you want to maintain a stable backport kernel with all the fixes that go into the bleeding edge, you're more than welcome to do it.
Everyone else is flat out just keeping up with on going development and fixing bugs in the kernel as it's moving forward. So if you have the need for stable backports, please keep backporting patches you need, testing them and asking the stable maintainers to include them.
[....]
Dave,
It is often the case, though maybe not always, that the author of a patch has the knowledge of the 'Fixes' commit and/or the stable kernel version patch is relevant to or would easily apply to.
In my experience (especially from my time as the maintainer) working out where a bug was introduced usually takes more time than it does to notice the bug, fix, test, post and review the patch.
It is therefore a relatively low effort for a developer to include this information
If it were easy, then everyone would just do it and everything would be magically backported and it would all just work and everyone would be happy.
Back in reality, however, it takes a *lot* of time and knowledge to isolate where a bug was introduced and whether the fix is even something we want to backport to older kernels . If you don't know the code, a git bisect is the only resort and even then there's a chance it doesn't isolate the cause because of some other bug or change. And a bisect is even more time and resource intensive than examining code history, especially for bugs introduced a long time ago.
Hence asking developers to do this for every bug the fix ....
as courtesy to stable maintainers,
.... is an unreasonable burden to place on developers and reviewers, especially those that don't really know the code they are fixing in any significant detail.
whether they are maintaining kernel.org stable kernels or distro stable kernels.
I've done my fair share of distro kernel maintenance and 500+ patch backports in the past. Doing backports requires looking at every patch that isn't in the older kernel, working out if the change is necessary and then working out all the dependencies that set of necessary patches requires. It's time consuming, complex, and easy to screw up, especially if you just blindly rely on "fixes" or "stable" comments in commits.
IOWs, if all you're doing is relying on "fixes" tags to determine what /might/ be needed in a stable kernel.org update, then your stable backport process is fundamentally broken. You're going to break things and make stable kernels worse for your users, not better.
And that's ignoring the elephant in the room. The big difference between distro backports and upstream stable kernels is the months of QA and bug fixing spent on the distro backports before any user gets near them. "stable" kernels might only get a couple of days of high level integration testing - it's really only enough to smoke test everything.
We have never had the time and resources to do properly maintained stable backports for upstream kernels because they move so fast and there are so many of them. It's a full time job in itself, and it's substantially wasted effort because the upstream process throws most of that work away every 3 months.
IOWs, if you want to maintain a long term stable upstream kernel backport for XFS, then go and put the effort into doing it properly. Don't demand that upstream developers do extra work on every change they make just so you're not inconvenienced in the future by having to do a little extra work when a one-off fix needs to be backported to a stable kernel.org kernel.
That's just my opinion.
Yup, everyone has one.
Cheers,
Dave.
On Wed, Mar 14, 2018 at 11:33:14PM +1100, Dave Chinner wrote:
IOWs, if all you're doing is relying on "fixes" tags to determine what /might/ be needed in a stable kernel.org update, then your stable backport process is fundamentally broken. You're going to break things and make stable kernels worse for your users, not better.
Agreed. As someone who has done a fair share of -stable backports for a customer: The backport to the last stable release is fairly easy, as it means picking everything that is not clearly a feature or cleanup, and you're generally still familiar with the code. It still needs quite a lot of QA time. Backports to older long-term stable bases can become much more hairy very quickly.
In either case Fixes: tags don't help at all. What helps is having one person doing the backports continiously so that they are in the loop. So when I had a paying customer for the backports it was fairly easy for me as I knew where I left off, need to pick up again and remember the pitfalls of the old stable code. Randomly Ccing stable or someone working from Fixes tags has none of those benefits. And espesically the CC stable is dangerous as there is no QA or detailed review performed.
On Wed, Mar 14, 2018 at 2:49 PM, Christoph Hellwig hch@lst.de wrote:
On Wed, Mar 14, 2018 at 11:33:14PM +1100, Dave Chinner wrote:
IOWs, if all you're doing is relying on "fixes" tags to determine what /might/ be needed in a stable kernel.org update, then your stable backport process is fundamentally broken. You're going to break things and make stable kernels worse for your users, not better.
Agreed. As someone who has done a fair share of -stable backports for a customer: The backport to the last stable release is fairly easy, as it means picking everything that is not clearly a feature or cleanup, and you're generally still familiar with the code. It still needs quite a lot of QA time. Backports to older long-term stable bases can become much more hairy very quickly.
In either case Fixes: tags don't help at all. What helps is having one person doing the backports continiously so that they are in the loop. So when I had a paying customer for the backports it was fairly easy for me as I knew where I left off, need to pick up again and remember the pitfalls of the old stable code. Randomly Ccing stable or someone working from Fixes tags has none of those benefits. And espesically the CC stable is dangerous as there is no QA or detailed review performed.
Got it.
I also read between the lines that the responsibility of herding the stable patches has shifted from you to Darrick in the last development cycle.
Eventually, I got my answer to how I should make sure my patch finds its way to stable, so I'm good with that.
Only wondering out loud if there should not be a process to expedite last cycle regression fixes, such as my patch, to the stable tree. After all, we are at 4.15.9 and I reported the regression even before v4.15 was released.
Thanks, Amir.
On Wed, Mar 14, 2018 at 05:45:40PM +0200, Amir Goldstein wrote:
I also read between the lines that the responsibility of herding the stable patches has shifted from you to Darrick in the last development cycle.
I have no idea if anyone is taking care of the stable tree at the moment, maybe we'll need a volunteer..
On Wed, Mar 14, 2018 at 05:45:40PM +0200, Amir Goldstein wrote:
On Wed, Mar 14, 2018 at 2:49 PM, Christoph Hellwig hch@lst.de wrote:
On Wed, Mar 14, 2018 at 11:33:14PM +1100, Dave Chinner wrote:
IOWs, if all you're doing is relying on "fixes" tags to determine what /might/ be needed in a stable kernel.org update, then your stable backport process is fundamentally broken. You're going to break things and make stable kernels worse for your users, not better.
Agreed. As someone who has done a fair share of -stable backports for a customer: The backport to the last stable release is fairly easy, as it means picking everything that is not clearly a feature or cleanup, and you're generally still familiar with the code. It still needs quite a lot of QA time. Backports to older long-term stable bases can become much more hairy very quickly.
In either case Fixes: tags don't help at all. What helps is having one person doing the backports continiously so that they are in the loop. So when I had a paying customer for the backports it was fairly easy for me as I knew where I left off, need to pick up again and remember the pitfalls of the old stable code. Randomly Ccing stable or someone working from Fixes tags has none of those benefits. And espesically the CC stable is dangerous as there is no QA or detailed review performed.
Got it.
I also read between the lines that the responsibility of herding the stable patches has shifted from you to Darrick in the last development cycle.
"..from [Christoph] to /dev/null..." would be more accurate. :(
At this point I must give up the fiction that between prepping/reviewing patches for the next kernel and fixing problems in the current rc I have any time for stable kernel stuff at all.
So, it's open season for anyone who /does/ have the time to pick out fixes and their dependencies, massage them into the appropriate stable kernels, and do at least the minimum xfstests QA (testing a v4, a v5 + everything, and a v5 + everything + 1k block size would be a good start).
Eventually, I got my answer to how I should make sure my patch finds its way to stable, so I'm good with that.
Only wondering out loud if there should not be a process to expedite last cycle regression fixes, such as my patch, to the stable tree. After all, we are at 4.15.9 and I reported the regression even before v4.15 was released.
Aaaanyway, this i_rdev preservation fix is ok for 4.15, since (as Amir has pointed out) it originated in 4.15-rc1.
--D
Thanks, Amir.
On Wed, Mar 14, 2018 at 08:24:30AM +0200, Amir Goldstein wrote:
On Tue, Mar 13, 2018 at 11:50 PM, Dave Chinner david@fromorbit.com wrote:
On Tue, Mar 13, 2018 at 04:33:15PM +0200, Amir Goldstein wrote:
On Tue, Mar 13, 2018 at 3:11 PM, Christoph Hellwig hch@lst.de wrote:
On Tue, Mar 13, 2018 at 02:46:09PM +0200, Amir Goldstein wrote:
OK, found the patches the fix soft lockups in generic/269 and assertion in generic/232, so expunging those 2 tests from v4.15.y test runs.
Which patches are those? We should probably backport them to 4.15-stable.
Probably, but I guess Darrick has those in his TODO.
There is this series that refers to failure in generic/232: https://marc.info/?l=linux-xfs&m=151701545720824&w=2
These 2 commits refer to generic/269 specifically in commit message: 70c57dcd606f xfs: skip CoW writes past EOF when writeback races with truncate be78ff0e7277 xfs: recheck reflink / dirty page status before freeing CoW reservations and the thread on the second commit also mentions generic/270 (I found out the hard way that it also soft locks).
But there are surely more patches for stable in master. I recon CC: stable and/or Fixes: tags could have been helpful, but I don't see any of those in v4.16-rcX from the core xfs developers.
AS I always say: if you want to maintain a stable backport kernel with all the fixes that go into the bleeding edge, you're more than welcome to do it.
Everyone else is flat out just keeping up with on going development and fixing bugs in the kernel as it's moving forward. So if you have the need for stable backports, please keep backporting patches you need, testing them and asking the stable maintainers to include them.
Greg,
I tested the patch in question per Darrick's request. I found no regressions with full "auto" run on xfs with reflinks enabled. Please include this patch in stable 4.15.
I have no idea anymore what "this patch" means here :(
Please resend the git commit id of what I need to apply to where.
thanks,
greg k-h
On Mon, Mar 19, 2018 at 3:40 PM, Greg KH gregkh@linuxfoundation.org wrote:
On Wed, Mar 14, 2018 at 08:24:30AM +0200, Amir Goldstein wrote:
On Tue, Mar 13, 2018 at 11:50 PM, Dave Chinner david@fromorbit.com wrote:
On Tue, Mar 13, 2018 at 04:33:15PM +0200, Amir Goldstein wrote:
On Tue, Mar 13, 2018 at 3:11 PM, Christoph Hellwig hch@lst.de wrote:
On Tue, Mar 13, 2018 at 02:46:09PM +0200, Amir Goldstein wrote:
OK, found the patches the fix soft lockups in generic/269 and assertion in generic/232, so expunging those 2 tests from v4.15.y test runs.
Which patches are those? We should probably backport them to 4.15-stable.
Probably, but I guess Darrick has those in his TODO.
There is this series that refers to failure in generic/232: https://marc.info/?l=linux-xfs&m=151701545720824&w=2
These 2 commits refer to generic/269 specifically in commit message: 70c57dcd606f xfs: skip CoW writes past EOF when writeback races with truncate be78ff0e7277 xfs: recheck reflink / dirty page status before freeing CoW reservations and the thread on the second commit also mentions generic/270 (I found out the hard way that it also soft locks).
But there are surely more patches for stable in master. I recon CC: stable and/or Fixes: tags could have been helpful, but I don't see any of those in v4.16-rcX from the core xfs developers.
AS I always say: if you want to maintain a stable backport kernel with all the fixes that go into the bleeding edge, you're more than welcome to do it.
Everyone else is flat out just keeping up with on going development and fixing bugs in the kernel as it's moving forward. So if you have the need for stable backports, please keep backporting patches you need, testing them and asking the stable maintainers to include them.
Greg,
I tested the patch in question per Darrick's request. I found no regressions with full "auto" run on xfs with reflinks enabled. Please include this patch in stable 4.15.
I have no idea anymore what "this patch" means here :(
Please resend the git commit id of what I need to apply to where.
Please apply commit acd1d71598f7 xfs: preserve i_rdev when recycling a reclaimable inode
to stable kernel v4.15
Thanks, Amir.
On Mon, Mar 19, 2018 at 04:59:00PM +0200, Amir Goldstein wrote:
On Mon, Mar 19, 2018 at 3:40 PM, Greg KH gregkh@linuxfoundation.org wrote:
On Wed, Mar 14, 2018 at 08:24:30AM +0200, Amir Goldstein wrote:
On Tue, Mar 13, 2018 at 11:50 PM, Dave Chinner david@fromorbit.com wrote:
On Tue, Mar 13, 2018 at 04:33:15PM +0200, Amir Goldstein wrote:
On Tue, Mar 13, 2018 at 3:11 PM, Christoph Hellwig hch@lst.de wrote:
On Tue, Mar 13, 2018 at 02:46:09PM +0200, Amir Goldstein wrote: > OK, found the patches the fix soft lockups in generic/269 and > assertion in generic/232, so expunging those 2 tests from v4.15.y > test runs.
Which patches are those? We should probably backport them to 4.15-stable.
Probably, but I guess Darrick has those in his TODO.
There is this series that refers to failure in generic/232: https://marc.info/?l=linux-xfs&m=151701545720824&w=2
These 2 commits refer to generic/269 specifically in commit message: 70c57dcd606f xfs: skip CoW writes past EOF when writeback races with truncate be78ff0e7277 xfs: recheck reflink / dirty page status before freeing CoW reservations and the thread on the second commit also mentions generic/270 (I found out the hard way that it also soft locks).
But there are surely more patches for stable in master. I recon CC: stable and/or Fixes: tags could have been helpful, but I don't see any of those in v4.16-rcX from the core xfs developers.
AS I always say: if you want to maintain a stable backport kernel with all the fixes that go into the bleeding edge, you're more than welcome to do it.
Everyone else is flat out just keeping up with on going development and fixing bugs in the kernel as it's moving forward. So if you have the need for stable backports, please keep backporting patches you need, testing them and asking the stable maintainers to include them.
Greg,
I tested the patch in question per Darrick's request. I found no regressions with full "auto" run on xfs with reflinks enabled. Please include this patch in stable 4.15.
I have no idea anymore what "this patch" means here :(
Please resend the git commit id of what I need to apply to where.
Please apply commit acd1d71598f7 xfs: preserve i_rdev when recycling a reclaimable inode
to stable kernel v4.15
Now applied, thanks.
greg k-h
linux-stable-mirror@lists.linaro.org