From: Dave Chinner dchinner@redhat.com
This reverts commit 61c6de667263184125d5ca75e894fcad632b0dd3.
The reverted commit added page reference counting to iomap page structures that are used to track block size < page size state. This was supposed to align the code with page migration page accounting assumptions, but what it has done instead is break XFS filesystems. Every fstests run I've done on sub-page block size XFS filesystems has since picking up this commit 2 days ago has failed with bad page state errors such as:
# ./run_check.sh "-m rmapbt=1,reflink=1 -i sparse=1 -b size=1k" "generic/038" .... SECTION -- xfs FSTYP -- xfs (debug) PLATFORM -- Linux/x86_64 test1 4.20.0-rc6-dgc+ MKFS_OPTIONS -- -f -m rmapbt=1,reflink=1 -i sparse=1 -b size=1k /dev/sdc MOUNT_OPTIONS -- /dev/sdc /mnt/scratch
generic/038 454s ... run fstests generic/038 at 2018-12-20 18:43:05 XFS (sdc): Unmounting Filesystem XFS (sdc): Mounting V5 Filesystem XFS (sdc): Ending clean mount BUG: Bad page state in process kswapd0 pfn:3a7fa page:ffffea0000ccbeb0 count:0 mapcount:0 mapping:ffff88800d9b6360 index:0x1 flags: 0xfffffc0000000() raw: 000fffffc0000000 dead000000000100 dead000000000200 ffff88800d9b6360 raw: 0000000000000001 0000000000000000 00000000ffffffff page dumped because: non-NULL mapping CPU: 0 PID: 676 Comm: kswapd0 Not tainted 4.20.0-rc6-dgc+ #915 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.1-1 04/01/2014 Call Trace: dump_stack+0x67/0x90 bad_page.cold.116+0x8a/0xbd free_pcppages_bulk+0x4bf/0x6a0 free_unref_page_list+0x10f/0x1f0 shrink_page_list+0x49d/0xf50 shrink_inactive_list+0x19d/0x3b0 shrink_node_memcg.constprop.77+0x398/0x690 ? shrink_slab.constprop.81+0x278/0x3f0 shrink_node+0x7a/0x2f0 kswapd+0x34b/0x6d0 ? node_reclaim+0x240/0x240 kthread+0x11f/0x140 ? __kthread_bind_mask+0x60/0x60 ret_from_fork+0x24/0x30 Disabling lock debugging due to kernel taint ....
The failures are from anyway that frees pages and empties the per-cpu page magazines, so it's not a predictable failure or an easy to debug failure.
generic/038 is a reliable reproducer of this problem - it has a 9 in 10 failure rate on one of my test machines. Failure on other machines have been at random points in fstests runs but every run has ended up tripping this problem. Hence generic/038 was used to bisect the failure because it was the most reliable failure.
It is too close to the 4.20 release (not to mention holidays) to try to diagnose, fix and test the underlying cause of the problem, so reverting the commit is the only option we have right now. The revert has been tested against a current tot 4.20-rc7+ kernel across multiple machines running sub-page block size XFs filesystems and none of the bad page state failures have been seen.
Signed-off-by: Dave Chinner dchinner@redhat.com --- fs/iomap.c | 7 ------- 1 file changed, 7 deletions(-)
diff --git a/fs/iomap.c b/fs/iomap.c index 5bc172f3dfe8..d6bc98ae8d35 100644 --- a/fs/iomap.c +++ b/fs/iomap.c @@ -116,12 +116,6 @@ iomap_page_create(struct inode *inode, struct page *page) atomic_set(&iop->read_count, 0); atomic_set(&iop->write_count, 0); bitmap_zero(iop->uptodate, PAGE_SIZE / SECTOR_SIZE); - - /* - * migrate_page_move_mapping() assumes that pages with private data have - * their count elevated by 1. - */ - get_page(page); set_page_private(page, (unsigned long)iop); SetPagePrivate(page); return iop; @@ -138,7 +132,6 @@ iomap_page_release(struct page *page) WARN_ON_ONCE(atomic_read(&iop->write_count)); ClearPagePrivate(page); set_page_private(page, 0); - put_page(page); kfree(iop); }
On Thu, Dec 20, 2018 at 11:23:24PM +1100, Dave Chinner wrote:
From: Dave Chinner dchinner@redhat.com
This reverts commit 61c6de667263184125d5ca75e894fcad632b0dd3.
The reverted commit added page reference counting to iomap page structures that are used to track block size < page size state. This was supposed to align the code with page migration page accounting assumptions, but what it has done instead is break XFS filesystems. Every fstests run I've done on sub-page block size XFS filesystems has since picking up this commit 2 days ago has failed with bad page state errors such as:
# ./run_check.sh "-m rmapbt=1,reflink=1 -i sparse=1 -b size=1k" "generic/038" .... SECTION -- xfs FSTYP -- xfs (debug) PLATFORM -- Linux/x86_64 test1 4.20.0-rc6-dgc+ MKFS_OPTIONS -- -f -m rmapbt=1,reflink=1 -i sparse=1 -b size=1k /dev/sdc MOUNT_OPTIONS -- /dev/sdc /mnt/scratch
generic/038 454s ... run fstests generic/038 at 2018-12-20 18:43:05 XFS (sdc): Unmounting Filesystem XFS (sdc): Mounting V5 Filesystem XFS (sdc): Ending clean mount BUG: Bad page state in process kswapd0 pfn:3a7fa page:ffffea0000ccbeb0 count:0 mapcount:0 mapping:ffff88800d9b6360 index:0x1 flags: 0xfffffc0000000() raw: 000fffffc0000000 dead000000000100 dead000000000200 ffff88800d9b6360 raw: 0000000000000001 0000000000000000 00000000ffffffff page dumped because: non-NULL mapping CPU: 0 PID: 676 Comm: kswapd0 Not tainted 4.20.0-rc6-dgc+ #915 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.1-1 04/01/2014 Call Trace: dump_stack+0x67/0x90 bad_page.cold.116+0x8a/0xbd free_pcppages_bulk+0x4bf/0x6a0 free_unref_page_list+0x10f/0x1f0 shrink_page_list+0x49d/0xf50 shrink_inactive_list+0x19d/0x3b0 shrink_node_memcg.constprop.77+0x398/0x690 ? shrink_slab.constprop.81+0x278/0x3f0 shrink_node+0x7a/0x2f0 kswapd+0x34b/0x6d0 ? node_reclaim+0x240/0x240 kthread+0x11f/0x140 ? __kthread_bind_mask+0x60/0x60 ret_from_fork+0x24/0x30 Disabling lock debugging due to kernel taint ....
The failures are from anyway that frees pages and empties the per-cpu page magazines, so it's not a predictable failure or an easy to debug failure.
generic/038 is a reliable reproducer of this problem - it has a 9 in 10 failure rate on one of my test machines. Failure on other machines have been at random points in fstests runs but every run has ended up tripping this problem. Hence generic/038 was used to bisect the failure because it was the most reliable failure.
It is too close to the 4.20 release (not to mention holidays) to try to diagnose, fix and test the underlying cause of the problem, so reverting the commit is the only option we have right now. The revert has been tested against a current tot 4.20-rc7+ kernel across multiple machines running sub-page block size XFs filesystems and none of the bad page state failures have been seen.
Signed-off-by: Dave Chinner dchinner@redhat.com
fs/iomap.c | 7 ------- 1 file changed, 7 deletions(-)
<formletter>
This is not the correct way to submit patches for inclusion in the stable kernel tree. Please read: https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html for how to do this properly.
</formletter>
On Thu, Dec 20, 2018 at 01:28:25PM +0100, Greg KH wrote:
On Thu, Dec 20, 2018 at 11:23:24PM +1100, Dave Chinner wrote:
From: Dave Chinner dchinner@redhat.com
This reverts commit 61c6de667263184125d5ca75e894fcad632b0dd3.
<snip>
<formletter>
This is not the correct way to submit patches for inclusion in the stable kernel tree. Please read: https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html for how to do this properly.
</formletter>
This wasn't a stable kernel patch submission. This was to let the stable kernel maintainers know that a commit that was submitted for stable kernel inclusion (via a cc stable in the commit message) is buggy and should not be backported. i.e. if you've already queued/ backported 61c6de6672 then you need to drop it from your trees.
Cheers,
Dave.
On Fri, Dec 21, 2018 at 07:57:21AM +1100, Dave Chinner wrote:
On Thu, Dec 20, 2018 at 01:28:25PM +0100, Greg KH wrote:
On Thu, Dec 20, 2018 at 11:23:24PM +1100, Dave Chinner wrote:
From: Dave Chinner dchinner@redhat.com
This reverts commit 61c6de667263184125d5ca75e894fcad632b0dd3.
<snip>
<formletter>
This is not the correct way to submit patches for inclusion in the stable kernel tree. Please read: https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html for how to do this properly.
</formletter>
This wasn't a stable kernel patch submission. This was to let the stable kernel maintainers know that a commit that was submitted for stable kernel inclusion (via a cc stable in the commit message) is buggy and should not be backported. i.e. if you've already queued/ backported 61c6de6672 then you need to drop it from your trees.
As the commit 61c6de667263 ("fs/iomap.c: get/put the page in iomap_page_create/release()") was tagged for a stable release (and is in 4.19.11) it would be nice to also add the cc: stable in the revert. When you just blindly send it to the stable mailing list, we have no idea what to do with it, hence the automated form letter that was sent.
Now that this is in Linus's tree, I'll dig out the revert and remember to queue it up for the next round of stable updates, thanks for letting us know.
greg k-h
On 12/21/18 1:40 AM, Greg KH wrote:
On Fri, Dec 21, 2018 at 07:57:21AM +1100, Dave Chinner wrote:
On Thu, Dec 20, 2018 at 01:28:25PM +0100, Greg KH wrote:
On Thu, Dec 20, 2018 at 11:23:24PM +1100, Dave Chinner wrote:
From: Dave Chinner dchinner@redhat.com
This reverts commit 61c6de667263184125d5ca75e894fcad632b0dd3.
<snip>
<formletter>
This is not the correct way to submit patches for inclusion in the stable kernel tree. Please read: https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html for how to do this properly.
</formletter>
This wasn't a stable kernel patch submission. This was to let the stable kernel maintainers know that a commit that was submitted for stable kernel inclusion (via a cc stable in the commit message) is buggy and should not be backported. i.e. if you've already queued/ backported 61c6de6672 then you need to drop it from your trees.
As the commit 61c6de667263 ("fs/iomap.c: get/put the page in iomap_page_create/release()") was tagged for a stable release (and is in 4.19.11) it would be nice to also add the cc: stable in the revert. When you just blindly send it to the stable mailing list, we have no idea what to do with it, hence the automated form letter that was sent.
Now that this is in Linus's tree, I'll dig out the revert and remember to queue it up for the next round of stable updates, thanks for letting us know.
As a side note from an interested observer, I had no idea that patches from not-yet-released Linus kernels could make it into a "stable" kernel within 5 days of Linus' commit to an -rc. That seems shockingly fast, with very little margin for error.
-Eric
On Fri, Dec 21, 2018 at 08:40:42AM +0100, Greg KH wrote:
On Fri, Dec 21, 2018 at 07:57:21AM +1100, Dave Chinner wrote:
On Thu, Dec 20, 2018 at 01:28:25PM +0100, Greg KH wrote:
On Thu, Dec 20, 2018 at 11:23:24PM +1100, Dave Chinner wrote:
From: Dave Chinner dchinner@redhat.com
This reverts commit 61c6de667263184125d5ca75e894fcad632b0dd3.
<snip>
<formletter>
This is not the correct way to submit patches for inclusion in the stable kernel tree. Please read: https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html for how to do this properly.
</formletter>
This wasn't a stable kernel patch submission. This was to let the stable kernel maintainers know that a commit that was submitted for stable kernel inclusion (via a cc stable in the commit message) is buggy and should not be backported. i.e. if you've already queued/ backported 61c6de6672 then you need to drop it from your trees.
As the commit 61c6de667263 ("fs/iomap.c: get/put the page in iomap_page_create/release()") was tagged for a stable release (and is in 4.19.11) it would be nice to also add the cc: stable in the revert. When you just blindly send it to the stable mailing list, we have no idea what to do with it, hence the automated form letter that was sent.
Now that this is in Linus's tree, I'll dig out the revert and remember to queue it up for the next round of stable updates, thanks for letting us know.
I've queued the revert for 4.19.
-- Thanks, Sasha
On Thu, Dec 20, 2018 at 06:41:30PM +0100, Christoph Hellwig wrote:
Lets fix the problem instead. I'll dig into it.
I'd say we still need to revert this for 4.20 which is due for release in a couple of days. Most of us are going to be on holidays and away from computers for the next week, so I don't think we have time to adequately review and test a new fix (which doesn't exist yet) before the release occurs.
Cheers,
Dave.
On Thu, Dec 20, 2018 at 12:53 PM Dave Chinner david@fromorbit.com wrote:
I'd say we still need to revert this for 4.20 which is due for release in a couple of days. Most of us are going to be on holidays and away from computers for the next week, so I don't think we have time to adequately review and test a new fix (which doesn't exist yet) before the release occurs.
Yeah. I already applied the revert. If somebody finds a "duh" moment, and an alternate fix gets posted and tested we can revert the revert and fix it properly, but for 4.20 (and for xmas) I do think that just going back to the previous state is otherwise the right choice.
Linus
On Thu, Dec 20, 2018 at 02:42:15PM -0800, Linus Torvalds wrote:
Yeah. I already applied the revert. If somebody finds a "duh" moment, and an alternate fix gets posted and tested we can revert the revert and fix it properly, but for 4.20 (and for xmas) I do think that just going back to the previous state is otherwise the right choice.
As far as I can tell the commit simply missed updating the recounts in the actual migrate callback for which it was added. The fixed version looks something like this (still running xfstests):
diff --git a/fs/iomap.c b/fs/iomap.c index d6bc98ae8d35..20c9c1cadd4e 100644 --- a/fs/iomap.c +++ b/fs/iomap.c @@ -116,6 +116,12 @@ iomap_page_create(struct inode *inode, struct page *page) atomic_set(&iop->read_count, 0); atomic_set(&iop->write_count, 0); bitmap_zero(iop->uptodate, PAGE_SIZE / SECTOR_SIZE); + + /* + * migrate_page_move_mapping() assumes that pages with private data have + * their count elevated by 1. + */ + get_page(page); set_page_private(page, (unsigned long)iop); SetPagePrivate(page); return iop; @@ -132,6 +138,7 @@ iomap_page_release(struct page *page) WARN_ON_ONCE(atomic_read(&iop->write_count)); ClearPagePrivate(page); set_page_private(page, 0); + put_page(page); kfree(iop); }
@@ -556,8 +563,10 @@ iomap_migrate_page(struct address_space *mapping, struct page *newpage,
if (page_has_private(page)) { ClearPagePrivate(page); + get_page(newpage); set_page_private(newpage, page_private(page)); set_page_private(page, 0); + put_page(page); SetPagePrivate(newpage); }
Dave, any chance you can validate Christoph's patch and we could re-apply the fix?
Linus
On Fri, Dec 21, 2018 at 1:39 AM Christoph Hellwig hch@lst.de wrote:
As far as I can tell the commit simply missed updating the recounts in the actual migrate callback for which it was added. The fixed version looks something like this (still running xfstests):
On Fri, Dec 21, 2018 at 11:18:35AM -0800, Linus Torvalds wrote:
Dave, any chance you can validate Christoph's patch and we could re-apply the fix?
I think Dave being in Australia is probably off to his well earned weekend, like I should here in Europe now :)
Given that it took so long to find the original issue I think we should not worry about 4.20 and just get the fix into 4.21-rc early and add it to stable.
On Fri, Dec 21, 2018 at 11:20 AM Christoph Hellwig hch@lst.de wrote:
Given that it took so long to find the original issue I think we should not worry about 4.20 and just get the fix into 4.21-rc early and add it to stable.
Fair enough, I'll stop worrying ;)
Linus
On Fri, Dec 21, 2018 at 11:18:35AM -0800, Linus Torvalds wrote:
Dave, any chance you can validate Christoph's patch and we could re-apply the fix?
Not for a few days - I'm out of the office until after xmas now.
I did notice the same bug in the original patch yesterday (and mentioned it to Darrick on #xfs) when doing post-mortem analysis, but I couldn't tell if that was the only problem all my test machines were tied up testing that the revert didn't introduce/uncover any new problems...
I'll test it as soon as I'm back around the office so we can get it in -rc1 if there's no problems.
Cheers,
Dave.
linux-stable-mirror@lists.linaro.org