Hi Greg,
This is the 2nd round of xfs fixes for 5.10.y.
The fixes in this part are from circa v5.11..v5.12. Per Luis' advise, I am trying to post a small number of fixes per LTS release.
These fixes have been soaking in the kdevops test env for the past week with no regressions observed from baseline 5.10.y. One backported fix was identified as a stable regression, so it was removed even before being posted as a candidate.
These fixes have been posted to review on xfs list [1].
Following review of stable candidates, one fix was removed ("xfs: fix up non-directory creation in SGID directories") because some effects of this change are still being fixed upstream and there was no consensus whether applying this fix alone is desired. It is still in my queue and will be posted to stable sometime later when remaining upstream issues are resolved.
Following review of another candidate, Dave has pointed me to a related fix that just got merged ("xfs: assert in xfs_btree_del_cursor should take into account error"), so I included it in my test tree and in this round of stable patches.
I would like to thank all the xfs developers that helped in the review of the stable candidates.
I would like to thank Samsung for contributing the hardware for the kdevops test environment and especially to Luis for his ongoing support in the test environment, which does most of the work for me :)
Thanks, Amir.
[1] https://lore.kernel.org/linux-xfs/20220601104547.260949-1-amir73il@gmail.com...
Brian Foster (3): xfs: sync lazy sb accounting on quiesce of read-only mounts xfs: restore shutdown check in mapped write fault path xfs: consider shutdown in bmapbt cursor delete assert
Darrick J. Wong (3): xfs: fix chown leaking delalloc quota blocks when fssetxattr fails xfs: fix incorrect root dquot corruption error when switching group/project quota types xfs: force log and push AIL to clear pinned inodes when aborting mount
Dave Chinner (1): xfs: assert in xfs_btree_del_cursor should take into account error
Jeffrey Mitchell (1): xfs: set inode size after creating symlink
fs/xfs/libxfs/xfs_btree.c | 35 +++++++-------- fs/xfs/xfs_dquot.c | 39 +++++++++++++++- fs/xfs/xfs_iomap.c | 3 ++ fs/xfs/xfs_log.c | 28 ++++++++---- fs/xfs/xfs_log.h | 1 + fs/xfs/xfs_mount.c | 93 +++++++++++++++++++-------------------- fs/xfs/xfs_qm.c | 92 +++++++++++++++----------------------- fs/xfs/xfs_symlink.c | 1 + 8 files changed, 158 insertions(+), 134 deletions(-)
From: Jeffrey Mitchell jeffrey.mitchell@starlab.io
commit 8aa921a95335d0a8c8e2be35a44467e7c91ec3e4 upstream.
When XFS creates a new symlink, it writes its size to disk but not to the VFS inode. This causes i_size_read() to return 0 for that symlink until it is re-read from disk, for example when the system is rebooted.
I found this inconsistency while protecting directories with eCryptFS. The command "stat path/to/symlink/in/ecryptfs" will report "Size: 0" if the symlink was created after the last reboot on an XFS root.
Call i_size_write() in xfs_symlink()
Signed-off-by: Jeffrey Mitchell jeffrey.mitchell@starlab.io Reviewed-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Darrick J. Wong djwong@kernel.org Reviewed-by: Christoph Hellwig hch@lst.de Reviewed-by: Brian Foster bfoster@redhat.com Signed-off-by: Amir Goldstein amir73il@gmail.com --- fs/xfs/xfs_symlink.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/fs/xfs/xfs_symlink.c b/fs/xfs/xfs_symlink.c index 8e88a7ca387e..8d3abf06c54f 100644 --- a/fs/xfs/xfs_symlink.c +++ b/fs/xfs/xfs_symlink.c @@ -300,6 +300,7 @@ xfs_symlink( } ASSERT(pathlen == 0); } + i_size_write(VFS_I(ip), ip->i_d.di_size);
/* * Create the directory entry for the symlink.
From: Brian Foster bfoster@redhat.com
commit 50d25484bebe94320c49dd1347d3330c7063bbdb upstream.
xfs_log_sbcount() syncs the superblock specifically to accumulate the in-core percpu superblock counters and commit them to disk. This is required to maintain filesystem consistency across quiesce (freeze, read-only mount/remount) or unmount when lazy superblock accounting is enabled because individual transactions do not update the superblock directly.
This mechanism works as expected for writable mounts, but xfs_log_sbcount() skips the update for read-only mounts. Read-only mounts otherwise still allow log recovery and write out an unmount record during log quiesce. If a read-only mount performs log recovery, it can modify the in-core superblock counters and write an unmount record when the filesystem unmounts without ever syncing the in-core counters. This leaves the filesystem with a clean log but in an inconsistent state with regard to lazy sb counters.
Update xfs_log_sbcount() to use the same logic xfs_log_unmount_write() uses to determine when to write an unmount record. This ensures that lazy accounting is always synced before the log is cleaned. Refactor this logic into a new helper to distinguish between a writable filesystem and a writable log. Specifically, the log is writable unless the filesystem is mounted with the norecovery mount option, the underlying log device is read-only, or the filesystem is shutdown. Drop the freeze state check because the update is already allowed during the freezing process and no context calls this function on an already frozen fs. Also, retain the shutdown check in xfs_log_unmount_write() to catch the case where the preceding log force might have triggered a shutdown.
Signed-off-by: Brian Foster bfoster@redhat.com Reviewed-by: Gao Xiang hsiangkao@redhat.com Reviewed-by: Allison Henderson allison.henderson@oracle.com Reviewed-by: Darrick J. Wong darrick.wong@oracle.com Reviewed-by: Bill O'Donnell billodo@redhat.com Reviewed-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Amir Goldstein amir73il@gmail.com --- fs/xfs/xfs_log.c | 28 ++++++++++++++++++++-------- fs/xfs/xfs_log.h | 1 + fs/xfs/xfs_mount.c | 3 +-- 3 files changed, 22 insertions(+), 10 deletions(-)
diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c index fa2d05e65ff1..b445e63cbc3c 100644 --- a/fs/xfs/xfs_log.c +++ b/fs/xfs/xfs_log.c @@ -347,6 +347,25 @@ xlog_tic_add_region(xlog_ticket_t *tic, uint len, uint type) tic->t_res_num++; }
+bool +xfs_log_writable( + struct xfs_mount *mp) +{ + /* + * Never write to the log on norecovery mounts, if the block device is + * read-only, or if the filesystem is shutdown. Read-only mounts still + * allow internal writes for log recovery and unmount purposes, so don't + * restrict that case here. + */ + if (mp->m_flags & XFS_MOUNT_NORECOVERY) + return false; + if (xfs_readonly_buftarg(mp->m_log->l_targ)) + return false; + if (XFS_FORCED_SHUTDOWN(mp)) + return false; + return true; +} + /* * Replenish the byte reservation required by moving the grant write head. */ @@ -886,15 +905,8 @@ xfs_log_unmount_write( { struct xlog *log = mp->m_log;
- /* - * Don't write out unmount record on norecovery mounts or ro devices. - * Or, if we are doing a forced umount (typically because of IO errors). - */ - if (mp->m_flags & XFS_MOUNT_NORECOVERY || - xfs_readonly_buftarg(log->l_targ)) { - ASSERT(mp->m_flags & XFS_MOUNT_RDONLY); + if (!xfs_log_writable(mp)) return; - }
xfs_log_force(mp, XFS_LOG_SYNC);
diff --git a/fs/xfs/xfs_log.h b/fs/xfs/xfs_log.h index 58c3fcbec94a..98c913da7587 100644 --- a/fs/xfs/xfs_log.h +++ b/fs/xfs/xfs_log.h @@ -127,6 +127,7 @@ int xfs_log_reserve(struct xfs_mount *mp, int xfs_log_regrant(struct xfs_mount *mp, struct xlog_ticket *tic); void xfs_log_unmount(struct xfs_mount *mp); int xfs_log_force_umount(struct xfs_mount *mp, int logerror); +bool xfs_log_writable(struct xfs_mount *mp);
struct xlog_ticket *xfs_log_ticket_get(struct xlog_ticket *ticket); void xfs_log_ticket_put(struct xlog_ticket *ticket); diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c index 7110507a2b6b..a62b8a574409 100644 --- a/fs/xfs/xfs_mount.c +++ b/fs/xfs/xfs_mount.c @@ -1176,8 +1176,7 @@ xfs_fs_writable( int xfs_log_sbcount(xfs_mount_t *mp) { - /* allow this to proceed during the freeze sequence... */ - if (!xfs_fs_writable(mp, SB_FREEZE_COMPLETE)) + if (!xfs_log_writable(mp)) return 0;
/*
From: "Darrick J. Wong" djwong@kernel.org
commit 1aecf3734a95f3c167d1495550ca57556d33f7ec upstream.
While refactoring the quota code to create a function to allocate inode change transactions, I noticed that xfs_qm_vop_chown_reserve does more than just make reservations: it also *modifies* the incore counts directly to handle the owner id change for the delalloc blocks.
I then observed that the fssetxattr code continues validating input arguments after making the quota reservation but before dirtying the transaction. If the routine decides to error out, it fails to undo the accounting switch! This leads to incorrect quota reservation and failure down the line.
We can fix this by making the reservation function do only that -- for the new dquot, it reserves ondisk and delalloc blocks to the transaction, and the old dquot hangs on to its incore reservation for now. Once we actually switch the dquots, we can then update the incore reservations because we've dirtied the transaction and it's too late to turn back now.
No fixes tag because this has been broken since the start of git.
Signed-off-by: Darrick J. Wong djwong@kernel.org Reviewed-by: Christoph Hellwig hch@lst.de Reviewed-by: Brian Foster bfoster@redhat.com Signed-off-by: Amir Goldstein amir73il@gmail.com --- fs/xfs/xfs_qm.c | 92 +++++++++++++++++++------------------------------ 1 file changed, 35 insertions(+), 57 deletions(-)
diff --git a/fs/xfs/xfs_qm.c b/fs/xfs/xfs_qm.c index b2a9abee8b2b..64e5da33733b 100644 --- a/fs/xfs/xfs_qm.c +++ b/fs/xfs/xfs_qm.c @@ -1785,6 +1785,29 @@ xfs_qm_vop_chown( xfs_trans_mod_dquot(tp, newdq, bfield, ip->i_d.di_nblocks); xfs_trans_mod_dquot(tp, newdq, XFS_TRANS_DQ_ICOUNT, 1);
+ /* + * Back when we made quota reservations for the chown, we reserved the + * ondisk blocks + delalloc blocks with the new dquot. Now that we've + * switched the dquots, decrease the new dquot's block reservation + * (having already bumped up the real counter) so that we don't have + * any reservation to give back when we commit. + */ + xfs_trans_mod_dquot(tp, newdq, XFS_TRANS_DQ_RES_BLKS, + -ip->i_delayed_blks); + + /* + * Give the incore reservation for delalloc blocks back to the old + * dquot. We don't normally handle delalloc quota reservations + * transactionally, so just lock the dquot and subtract from the + * reservation. Dirty the transaction because it's too late to turn + * back now. + */ + tp->t_flags |= XFS_TRANS_DIRTY; + xfs_dqlock(prevdq); + ASSERT(prevdq->q_blk.reserved >= ip->i_delayed_blks); + prevdq->q_blk.reserved -= ip->i_delayed_blks; + xfs_dqunlock(prevdq); + /* * Take an extra reference, because the inode is going to keep * this dquot pointer even after the trans_commit. @@ -1807,84 +1830,39 @@ xfs_qm_vop_chown_reserve( uint flags) { struct xfs_mount *mp = ip->i_mount; - uint64_t delblks; unsigned int blkflags; - struct xfs_dquot *udq_unres = NULL; - struct xfs_dquot *gdq_unres = NULL; - struct xfs_dquot *pdq_unres = NULL; struct xfs_dquot *udq_delblks = NULL; struct xfs_dquot *gdq_delblks = NULL; struct xfs_dquot *pdq_delblks = NULL; - int error; -
ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL|XFS_ILOCK_SHARED)); ASSERT(XFS_IS_QUOTA_RUNNING(mp));
- delblks = ip->i_delayed_blks; blkflags = XFS_IS_REALTIME_INODE(ip) ? XFS_QMOPT_RES_RTBLKS : XFS_QMOPT_RES_REGBLKS;
if (XFS_IS_UQUOTA_ON(mp) && udqp && - i_uid_read(VFS_I(ip)) != udqp->q_id) { + i_uid_read(VFS_I(ip)) != udqp->q_id) udq_delblks = udqp; - /* - * If there are delayed allocation blocks, then we have to - * unreserve those from the old dquot, and add them to the - * new dquot. - */ - if (delblks) { - ASSERT(ip->i_udquot); - udq_unres = ip->i_udquot; - } - } + if (XFS_IS_GQUOTA_ON(ip->i_mount) && gdqp && - i_gid_read(VFS_I(ip)) != gdqp->q_id) { + i_gid_read(VFS_I(ip)) != gdqp->q_id) gdq_delblks = gdqp; - if (delblks) { - ASSERT(ip->i_gdquot); - gdq_unres = ip->i_gdquot; - } - }
if (XFS_IS_PQUOTA_ON(ip->i_mount) && pdqp && - ip->i_d.di_projid != pdqp->q_id) { + ip->i_d.di_projid != pdqp->q_id) pdq_delblks = pdqp; - if (delblks) { - ASSERT(ip->i_pdquot); - pdq_unres = ip->i_pdquot; - } - } - - error = xfs_trans_reserve_quota_bydquots(tp, ip->i_mount, - udq_delblks, gdq_delblks, pdq_delblks, - ip->i_d.di_nblocks, 1, flags | blkflags); - if (error) - return error;
/* - * Do the delayed blks reservations/unreservations now. Since, these - * are done without the help of a transaction, if a reservation fails - * its previous reservations won't be automatically undone by trans - * code. So, we have to do it manually here. + * Reserve enough quota to handle blocks on disk and reserved for a + * delayed allocation. We'll actually transfer the delalloc + * reservation between dquots at chown time, even though that part is + * only semi-transactional. */ - if (delblks) { - /* - * Do the reservations first. Unreservation can't fail. - */ - ASSERT(udq_delblks || gdq_delblks || pdq_delblks); - ASSERT(udq_unres || gdq_unres || pdq_unres); - error = xfs_trans_reserve_quota_bydquots(NULL, ip->i_mount, - udq_delblks, gdq_delblks, pdq_delblks, - (xfs_qcnt_t)delblks, 0, flags | blkflags); - if (error) - return error; - xfs_trans_reserve_quota_bydquots(NULL, ip->i_mount, - udq_unres, gdq_unres, pdq_unres, - -((xfs_qcnt_t)delblks), 0, blkflags); - } - - return 0; + return xfs_trans_reserve_quota_bydquots(tp, ip->i_mount, udq_delblks, + gdq_delblks, pdq_delblks, + ip->i_d.di_nblocks + ip->i_delayed_blks, + 1, blkflags | flags); }
int
From: "Darrick J. Wong" djwong@kernel.org
commit 45068063efb7dd0a8d115c106aa05d9ab0946257 upstream.
While writing up a regression test for broken behavior when a chprojid request fails, I noticed that we were logging corruption notices about the root dquot of the group/project quota file at mount time when testing V4 filesystems.
In commit afeda6000b0c, I was trying to improve ondisk dquot validation by making sure that when we load an ondisk dquot into memory on behalf of an incore dquot, the dquot id and type matches. Unfortunately, I forgot that V4 filesystems only have two quota files, and can switch that file between group and project quota types at mount time. When we perform that switch, we'll try to load the default quota limits from the root dquot prior to running quotacheck and log a corruption error when the types don't match.
This is inconsequential because quotacheck will reset the second quota file as part of doing the switch, but we shouldn't leave scary messages in the kernel log.
Fixes: afeda6000b0c ("xfs: validate ondisk/incore dquot flags") Signed-off-by: Darrick J. Wong djwong@kernel.org Reviewed-by: Brian Foster bfoster@redhat.com Reviewed-by: Chandan Babu R chandanrlinux@gmail.com Signed-off-by: Amir Goldstein amir73il@gmail.com --- fs/xfs/xfs_dquot.c | 39 +++++++++++++++++++++++++++++++++++++-- 1 file changed, 37 insertions(+), 2 deletions(-)
diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c index 1d95ed387d66..80c4579d6835 100644 --- a/fs/xfs/xfs_dquot.c +++ b/fs/xfs/xfs_dquot.c @@ -500,6 +500,42 @@ xfs_dquot_alloc( return dqp; }
+/* Check the ondisk dquot's id and type match what the incore dquot expects. */ +static bool +xfs_dquot_check_type( + struct xfs_dquot *dqp, + struct xfs_disk_dquot *ddqp) +{ + uint8_t ddqp_type; + uint8_t dqp_type; + + ddqp_type = ddqp->d_type & XFS_DQTYPE_REC_MASK; + dqp_type = xfs_dquot_type(dqp); + + if (be32_to_cpu(ddqp->d_id) != dqp->q_id) + return false; + + /* + * V5 filesystems always expect an exact type match. V4 filesystems + * expect an exact match for user dquots and for non-root group and + * project dquots. + */ + if (xfs_sb_version_hascrc(&dqp->q_mount->m_sb) || + dqp_type == XFS_DQTYPE_USER || dqp->q_id != 0) + return ddqp_type == dqp_type; + + /* + * V4 filesystems support either group or project quotas, but not both + * at the same time. The non-user quota file can be switched between + * group and project quota uses depending on the mount options, which + * means that we can encounter the other type when we try to load quota + * defaults. Quotacheck will soon reset the the entire quota file + * (including the root dquot) anyway, but don't log scary corruption + * reports to dmesg. + */ + return ddqp_type == XFS_DQTYPE_GROUP || ddqp_type == XFS_DQTYPE_PROJ; +} + /* Copy the in-core quota fields in from the on-disk buffer. */ STATIC int xfs_dquot_from_disk( @@ -512,8 +548,7 @@ xfs_dquot_from_disk( * Ensure that we got the type and ID we were looking for. * Everything else was checked by the dquot buffer verifier. */ - if ((ddqp->d_type & XFS_DQTYPE_REC_MASK) != xfs_dquot_type(dqp) || - be32_to_cpu(ddqp->d_id) != dqp->q_id) { + if (!xfs_dquot_check_type(dqp, ddqp)) { xfs_alert_tag(bp->b_mount, XFS_PTAG_VERIFIER_ERROR, "Metadata corruption detected at %pS, quota %u", __this_address, dqp->q_id);
From: Brian Foster bfoster@redhat.com
commit e4826691cc7e5458bcb659935d0092bcf3f08c20 upstream.
XFS triggers an iomap warning in the write fault path due to a !PageUptodate() page if a write fault happens to occur on a page that recently failed writeback. The iomap writeback error handling code can clear the Uptodate flag if no portion of the page is submitted for I/O. This is reproduced by fstest generic/019, which combines various forms of I/O with simulated disk failures that inevitably lead to filesystem shutdown (which then unconditionally fails page writeback).
This is a regression introduced by commit f150b4234397 ("xfs: split the iomap ops for buffered vs direct writes") due to the removal of a shutdown check and explicit error return in the ->iomap_begin() path used by the write fault path. The explicit error return historically translated to a SIGBUS, but now carries on with iomap processing where it complains about the unexpected state. Restore the shutdown check to xfs_buffered_write_iomap_begin() to restore historical behavior.
Fixes: f150b4234397 ("xfs: split the iomap ops for buffered vs direct writes") Signed-off-by: Brian Foster bfoster@redhat.com Reviewed-by: Eric Sandeen sandeen@redhat.com Reviewed-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Amir Goldstein amir73il@gmail.com --- fs/xfs/xfs_iomap.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c index 7b9ff824e82d..74bc2beadc23 100644 --- a/fs/xfs/xfs_iomap.c +++ b/fs/xfs/xfs_iomap.c @@ -870,6 +870,9 @@ xfs_buffered_write_iomap_begin( int allocfork = XFS_DATA_FORK; int error = 0;
+ if (XFS_FORCED_SHUTDOWN(mp)) + return -EIO; + /* we can't use delayed allocations when using extent size hints */ if (xfs_get_extsz_hint(ip)) return xfs_direct_write_iomap_begin(inode, offset, count,
From: "Darrick J. Wong" djwong@kernel.org
commit d336f7ebc65007f5831e2297e6f3383ae8dbf8ed upstream.
If we allocate quota inodes in the process of mounting a filesystem but then decide to abort the mount, it's possible that the quota inodes are sitting around pinned by the log. Now that inode reclaim relies on the AIL to flush inodes, we have to force the log and push the AIL in between releasing the quota inodes and kicking off reclaim to tear down all the incore inodes. Do this by extracting the bits we need from the unmount path and reusing them. As an added bonus, failed writes during a failed mount will not retry forever now.
This was originally found during a fuzz test of metadata directories (xfs/1546), but the actual symptom was that reclaim hung up on the quota inodes.
Signed-off-by: Darrick J. Wong djwong@kernel.org Reviewed-by: Christoph Hellwig hch@lst.de Reviewed-by: Dave Chinner dchinner@redhat.com Signed-off-by: Amir Goldstein amir73il@gmail.com --- fs/xfs/xfs_mount.c | 90 +++++++++++++++++++++++----------------------- 1 file changed, 44 insertions(+), 46 deletions(-)
diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c index a62b8a574409..44b05e1d5d32 100644 --- a/fs/xfs/xfs_mount.c +++ b/fs/xfs/xfs_mount.c @@ -631,6 +631,47 @@ xfs_check_summary_counts( return xfs_initialize_perag_data(mp, mp->m_sb.sb_agcount); }
+/* + * Flush and reclaim dirty inodes in preparation for unmount. Inodes and + * internal inode structures can be sitting in the CIL and AIL at this point, + * so we need to unpin them, write them back and/or reclaim them before unmount + * can proceed. + * + * An inode cluster that has been freed can have its buffer still pinned in + * memory because the transaction is still sitting in a iclog. The stale inodes + * on that buffer will be pinned to the buffer until the transaction hits the + * disk and the callbacks run. Pushing the AIL will skip the stale inodes and + * may never see the pinned buffer, so nothing will push out the iclog and + * unpin the buffer. + * + * Hence we need to force the log to unpin everything first. However, log + * forces don't wait for the discards they issue to complete, so we have to + * explicitly wait for them to complete here as well. + * + * Then we can tell the world we are unmounting so that error handling knows + * that the filesystem is going away and we should error out anything that we + * have been retrying in the background. This will prevent never-ending + * retries in AIL pushing from hanging the unmount. + * + * Finally, we can push the AIL to clean all the remaining dirty objects, then + * reclaim the remaining inodes that are still in memory at this point in time. + */ +static void +xfs_unmount_flush_inodes( + struct xfs_mount *mp) +{ + xfs_log_force(mp, XFS_LOG_SYNC); + xfs_extent_busy_wait_all(mp); + flush_workqueue(xfs_discard_wq); + + mp->m_flags |= XFS_MOUNT_UNMOUNTING; + + xfs_ail_push_all_sync(mp->m_ail); + cancel_delayed_work_sync(&mp->m_reclaim_work); + xfs_reclaim_inodes(mp); + xfs_health_unmount(mp); +} + /* * This function does the following on an initial mount of a file system: * - reads the superblock from disk and init the mount struct @@ -1005,7 +1046,7 @@ xfs_mountfs( /* Clean out dquots that might be in memory after quotacheck. */ xfs_qm_unmount(mp); /* - * Cancel all delayed reclaim work and reclaim the inodes directly. + * Flush all inode reclamation work and flush the log. * We have to do this /after/ rtunmount and qm_unmount because those * two will have scheduled delayed reclaim for the rt/quota inodes. * @@ -1015,11 +1056,8 @@ xfs_mountfs( * qm_unmount_quotas and therefore rely on qm_unmount to release the * quota inodes. */ - cancel_delayed_work_sync(&mp->m_reclaim_work); - xfs_reclaim_inodes(mp); - xfs_health_unmount(mp); + xfs_unmount_flush_inodes(mp); out_log_dealloc: - mp->m_flags |= XFS_MOUNT_UNMOUNTING; xfs_log_mount_cancel(mp); out_fail_wait: if (mp->m_logdev_targp && mp->m_logdev_targp != mp->m_ddev_targp) @@ -1060,47 +1098,7 @@ xfs_unmountfs( xfs_rtunmount_inodes(mp); xfs_irele(mp->m_rootip);
- /* - * We can potentially deadlock here if we have an inode cluster - * that has been freed has its buffer still pinned in memory because - * the transaction is still sitting in a iclog. The stale inodes - * on that buffer will be pinned to the buffer until the - * transaction hits the disk and the callbacks run. Pushing the AIL will - * skip the stale inodes and may never see the pinned buffer, so - * nothing will push out the iclog and unpin the buffer. Hence we - * need to force the log here to ensure all items are flushed into the - * AIL before we go any further. - */ - xfs_log_force(mp, XFS_LOG_SYNC); - - /* - * Wait for all busy extents to be freed, including completion of - * any discard operation. - */ - xfs_extent_busy_wait_all(mp); - flush_workqueue(xfs_discard_wq); - - /* - * We now need to tell the world we are unmounting. This will allow - * us to detect that the filesystem is going away and we should error - * out anything that we have been retrying in the background. This will - * prevent neverending retries in AIL pushing from hanging the unmount. - */ - mp->m_flags |= XFS_MOUNT_UNMOUNTING; - - /* - * Flush all pending changes from the AIL. - */ - xfs_ail_push_all_sync(mp->m_ail); - - /* - * Reclaim all inodes. At this point there should be no dirty inodes and - * none should be pinned or locked. Stop background inode reclaim here - * if it is still running. - */ - cancel_delayed_work_sync(&mp->m_reclaim_work); - xfs_reclaim_inodes(mp); - xfs_health_unmount(mp); + xfs_unmount_flush_inodes(mp);
xfs_qm_unmount(mp);
From: Brian Foster bfoster@redhat.com
commit 1cd738b13ae9b29e03d6149f0246c61f76e81fcf upstream.
The assert in xfs_btree_del_cursor() checks that the bmapbt block allocation field has been handled correctly before the cursor is freed. This field is used for accurate calculation of indirect block reservation requirements (for delayed allocations), for example. generic/019 reproduces a scenario where this assert fails because the filesystem has shutdown while in the middle of a bmbt record insertion. This occurs after a bmbt block has been allocated via the cursor but before the higher level bmap function (i.e. xfs_bmap_add_extent_hole_real()) completes and resets the field.
Update the assert to accommodate the transient state if the filesystem has shutdown. While here, clean up the indentation and comments in the function.
Signed-off-by: Brian Foster bfoster@redhat.com Reviewed-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Amir Goldstein amir73il@gmail.com --- fs/xfs/libxfs/xfs_btree.c | 33 ++++++++++++--------------------- 1 file changed, 12 insertions(+), 21 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c index 2d25bab68764..9f9f9feccbcd 100644 --- a/fs/xfs/libxfs/xfs_btree.c +++ b/fs/xfs/libxfs/xfs_btree.c @@ -353,20 +353,17 @@ xfs_btree_free_block( */ void xfs_btree_del_cursor( - xfs_btree_cur_t *cur, /* btree cursor */ - int error) /* del because of error */ + struct xfs_btree_cur *cur, /* btree cursor */ + int error) /* del because of error */ { - int i; /* btree level */ + int i; /* btree level */
/* - * Clear the buffer pointers, and release the buffers. - * If we're doing this in the face of an error, we - * need to make sure to inspect all of the entries - * in the bc_bufs array for buffers to be unlocked. - * This is because some of the btree code works from - * level n down to 0, and if we get an error along - * the way we won't have initialized all the entries - * down to 0. + * Clear the buffer pointers and release the buffers. If we're doing + * this because of an error, inspect all of the entries in the bc_bufs + * array for buffers to be unlocked. This is because some of the btree + * code works from level n down to 0, and if we get an error along the + * way we won't have initialized all the entries down to 0. */ for (i = 0; i < cur->bc_nlevels; i++) { if (cur->bc_bufs[i]) @@ -374,17 +371,11 @@ xfs_btree_del_cursor( else if (!error) break; } - /* - * Can't free a bmap cursor without having dealt with the - * allocated indirect blocks' accounting. - */ - ASSERT(cur->bc_btnum != XFS_BTNUM_BMAP || - cur->bc_ino.allocated == 0); - /* - * Free the cursor. - */ + + ASSERT(cur->bc_btnum != XFS_BTNUM_BMAP || cur->bc_ino.allocated == 0 || + XFS_FORCED_SHUTDOWN(cur->bc_mp)); if (unlikely(cur->bc_flags & XFS_BTREE_STAGING)) - kmem_free((void *)cur->bc_ops); + kmem_free(cur->bc_ops); kmem_cache_free(xfs_btree_cur_zone, cur); }
From: Dave Chinner dchinner@redhat.com
commit 56486f307100e8fc66efa2ebd8a71941fa10bf6f upstream.
xfs/538 on a 1kB block filesystem failed with this assert:
XFS: Assertion failed: cur->bc_btnum != XFS_BTNUM_BMAP || cur->bc_ino.allocated == 0 || xfs_is_shutdown(cur->bc_mp), file: fs/xfs/libxfs/xfs_btree.c, line: 448
The problem was that an allocation failed unexpectedly in xfs_bmbt_alloc_block() after roughly 150,000 minlen allocation error injections, resulting in an EFSCORRUPTED error being returned to xfs_bmapi_write(). The error occurred on extent-to-btree format conversion allocating the new root block:
RIP: 0010:xfs_bmbt_alloc_block+0x177/0x210 Call Trace: <TASK> xfs_btree_new_iroot+0xdf/0x520 xfs_btree_make_block_unfull+0x10d/0x1c0 xfs_btree_insrec+0x364/0x790 xfs_btree_insert+0xaa/0x210 xfs_bmap_add_extent_hole_real+0x1fe/0x9a0 xfs_bmapi_allocate+0x34c/0x420 xfs_bmapi_write+0x53c/0x9c0 xfs_alloc_file_space+0xee/0x320 xfs_file_fallocate+0x36b/0x450 vfs_fallocate+0x148/0x340 __x64_sys_fallocate+0x3c/0x70 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xa
Why the allocation failed at this point is unknown, but is likely that we ran the transaction out of reserved space and filesystem out of space with bmbt blocks because of all the minlen allocations being done causing worst case fragmentation of a large allocation.
Regardless of the cause, we've then called xfs_bmapi_finish() which calls xfs_btree_del_cursor(cur, error) to tear down the cursor.
So we have a failed operation, error != 0, cur->bc_ino.allocated > 0 and the filesystem is still up. The assert fails to take into account that allocation can fail with an error and the transaction teardown will shut the filesystem down if necessary. i.e. the assert needs to check "|| error != 0" as well, because at this point shutdown is pending because the current transaction is dirty....
Signed-off-by: Dave Chinner dchinner@redhat.com Reviewed-by: Darrick J. Wong djwong@kernel.org Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Dave Chinner david@fromorbit.com Signed-off-by: Amir Goldstein amir73il@gmail.com --- fs/xfs/libxfs/xfs_btree.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c index 9f9f9feccbcd..98c82f4935e1 100644 --- a/fs/xfs/libxfs/xfs_btree.c +++ b/fs/xfs/libxfs/xfs_btree.c @@ -372,8 +372,14 @@ xfs_btree_del_cursor( break; }
+ /* + * If we are doing a BMBT update, the number of unaccounted blocks + * allocated during this cursor life time should be zero. If it's not + * zero, then we should be shut down or on our way to shutdown due to + * cancelling a dirty transaction on error. + */ ASSERT(cur->bc_btnum != XFS_BTNUM_BMAP || cur->bc_ino.allocated == 0 || - XFS_FORCED_SHUTDOWN(cur->bc_mp)); + XFS_FORCED_SHUTDOWN(cur->bc_mp) || error != 0); if (unlikely(cur->bc_flags & XFS_BTREE_STAGING)) kmem_free(cur->bc_ops); kmem_cache_free(xfs_btree_cur_zone, cur);
On Mon, Jun 06, 2022 at 05:32:55PM +0300, Amir Goldstein wrote:
From: Dave Chinner dchinner@redhat.com
commit 56486f307100e8fc66efa2ebd8a71941fa10bf6f upstream.
xfs/538 on a 1kB block filesystem failed with this assert:
XFS: Assertion failed: cur->bc_btnum != XFS_BTNUM_BMAP || cur->bc_ino.allocated == 0 || xfs_is_shutdown(cur->bc_mp), file: fs/xfs/libxfs/xfs_btree.c, line: 448
You haven't mentioned that you combined a second upstream commit into this patch to fix the bug in this commit.....
Cheers,
Dave.
On Tue, Jun 7, 2022 at 12:30 AM Dave Chinner david@fromorbit.com wrote:
On Mon, Jun 06, 2022 at 05:32:55PM +0300, Amir Goldstein wrote:
From: Dave Chinner dchinner@redhat.com
commit 56486f307100e8fc66efa2ebd8a71941fa10bf6f upstream.
xfs/538 on a 1kB block filesystem failed with this assert:
XFS: Assertion failed: cur->bc_btnum != XFS_BTNUM_BMAP || cur->bc_ino.allocated == 0 || xfs_is_shutdown(cur->bc_mp), file: fs/xfs/libxfs/xfs_btree.c, line: 448
You haven't mentioned that you combined a second upstream commit into this patch to fix the bug in this commit.....
I am confused.
patch [5.10 7/8] xfs: consider shutdown in bmapbt cursor delete assert is the patch that I backported from 5.12 and posted for review. This patch [5.10 8/8] is the patch from 5.19-rc1 that you pointed out that I should take to fix the bug in patch [5.10 7/8].
The upstream commits correspond to the original commits each.
If I missed anything, please correct me and I will fix it.
Thanks, Amir.
On Tue, Jun 07, 2022 at 01:33:06AM +0300, Amir Goldstein wrote:
On Tue, Jun 7, 2022 at 12:30 AM Dave Chinner david@fromorbit.com wrote:
On Mon, Jun 06, 2022 at 05:32:55PM +0300, Amir Goldstein wrote:
From: Dave Chinner dchinner@redhat.com
commit 56486f307100e8fc66efa2ebd8a71941fa10bf6f upstream.
xfs/538 on a 1kB block filesystem failed with this assert:
XFS: Assertion failed: cur->bc_btnum != XFS_BTNUM_BMAP || cur->bc_ino.allocated == 0 || xfs_is_shutdown(cur->bc_mp), file: fs/xfs/libxfs/xfs_btree.c, line: 448
You haven't mentioned that you combined a second upstream commit into this patch to fix the bug in this commit.....
I am confused.
patch [5.10 7/8] xfs: consider shutdown in bmapbt cursor delete assert is the patch that I backported from 5.12 and posted for review. This patch [5.10 8/8] is the patch from 5.19-rc1 that you pointed out that I should take to fix the bug in patch [5.10 7/8].
Sorry, I missed that this was a new patch because the set looked the same as the last posting and you said in the summary letter:
"These fixes have been posted to review on xfs list [1]."
Except this patch *wasn't part of that set*. I mistook it for the patch that introduced the assert because I assumed from the above statement, the absence of a changelog in cover letter and that you'd sent it to Greg meant for inclusion meant *no new patches had been added*.
Add to that the fact I rushed through them because I saw that Greg has already queued these before anyone had any time to actually review the posting. Hence the timing of the release of unreviewed patches has been taken completely out of our control, and so I rushed through them and misinterpreted what I was seeing.
That's not how the review process is supposed to work. You need to wait for people to review the changes and ACK them before then asking for them to be merged into the stable trees. You need to have changelogs in your summary letters. You need to do all the things that you've been complaining bitterly about over the past month that upstream developers weren't doing 18 months ago.
And I notice that you've already sent out yet another set of stable patches for review despite the paint not even being dry on these ones. Not to mention that there's a another set of different 5.15 stable patches out for review as well.
This is not a sustainable process.
Seriously: slow down and stop being so damn aggressive. Let everyone catch up and build sustainable processes and timetables. If you keep going like this, you are going break people.
-Dave.
On Tue, Jun 07, 2022 at 01:01:47PM +1000, Dave Chinner wrote:
On Tue, Jun 07, 2022 at 01:33:06AM +0300, Amir Goldstein wrote:
On Tue, Jun 7, 2022 at 12:30 AM Dave Chinner david@fromorbit.com wrote:
On Mon, Jun 06, 2022 at 05:32:55PM +0300, Amir Goldstein wrote:
From: Dave Chinner dchinner@redhat.com
commit 56486f307100e8fc66efa2ebd8a71941fa10bf6f upstream.
xfs/538 on a 1kB block filesystem failed with this assert:
XFS: Assertion failed: cur->bc_btnum != XFS_BTNUM_BMAP || cur->bc_ino.allocated == 0 || xfs_is_shutdown(cur->bc_mp), file: fs/xfs/libxfs/xfs_btree.c, line: 448
You haven't mentioned that you combined a second upstream commit into this patch to fix the bug in this commit.....
I am confused.
patch [5.10 7/8] xfs: consider shutdown in bmapbt cursor delete assert is the patch that I backported from 5.12 and posted for review. This patch [5.10 8/8] is the patch from 5.19-rc1 that you pointed out that I should take to fix the bug in patch [5.10 7/8].
Sorry, I missed that this was a new patch because the set looked the same as the last posting and you said in the summary letter:
"These fixes have been posted to review on xfs list [1]."
Except this patch *wasn't part of that set*. I mistook it for the patch that introduced the assert because I assumed from the above statement, the absence of a changelog in cover letter and that you'd sent it to Greg meant for inclusion meant *no new patches had been added*.
Add to that the fact I rushed through them because I saw that Greg has already queued these before anyone had any time to actually review the posting. Hence the timing of the release of unreviewed patches has been taken completely out of our control, and so I rushed through them and misinterpreted what I was seeing.
That's not how the review process is supposed to work. You need to wait for people to review the changes and ACK them before then asking for them to be merged into the stable trees. You need to have changelogs in your summary letters. You need to do all the things that you've been complaining bitterly about over the past month that upstream developers weren't doing 18 months ago.
I thought these had already been reviewed, which is why I took them.
And there still are days before these go anywhere, just adding them to the stable queue doesn't mean they are "set in stone".
Heck, even if they do get merged into a stable release, 'git revert' is our friend here, and we can easily revert anything that is found to be wrong.
And I notice that you've already sent out yet another set of stable patches for review despite the paint not even being dry on these ones. Not to mention that there's a another set of different 5.15 stable patches out for review as well.
This is not a sustainable process.
Seriously: slow down and stop being so damn aggressive. Let everyone catch up and build sustainable processes and timetables. If you keep going like this, you are going break people.
What am I supposed to do here, not take patches you all send me? Wait X number of days?
totally confused,
greg k-h
On Tue, Jun 7, 2022 at 7:47 AM Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
On Tue, Jun 07, 2022 at 01:01:47PM +1000, Dave Chinner wrote:
On Tue, Jun 07, 2022 at 01:33:06AM +0300, Amir Goldstein wrote:
On Tue, Jun 7, 2022 at 12:30 AM Dave Chinner david@fromorbit.com wrote:
On Mon, Jun 06, 2022 at 05:32:55PM +0300, Amir Goldstein wrote:
From: Dave Chinner dchinner@redhat.com
commit 56486f307100e8fc66efa2ebd8a71941fa10bf6f upstream.
xfs/538 on a 1kB block filesystem failed with this assert:
XFS: Assertion failed: cur->bc_btnum != XFS_BTNUM_BMAP || cur->bc_ino.allocated == 0 || xfs_is_shutdown(cur->bc_mp), file: fs/xfs/libxfs/xfs_btree.c, line: 448
You haven't mentioned that you combined a second upstream commit into this patch to fix the bug in this commit.....
I am confused.
patch [5.10 7/8] xfs: consider shutdown in bmapbt cursor delete assert is the patch that I backported from 5.12 and posted for review. This patch [5.10 8/8] is the patch from 5.19-rc1 that you pointed out that I should take to fix the bug in patch [5.10 7/8].
Sorry, I missed that this was a new patch because the set looked the same as the last posting and you said in the summary letter:
"These fixes have been posted to review on xfs list [1]."
Except this patch *wasn't part of that set*. I mistook it for the patch that introduced the assert because I assumed from the above statement, the absence of a changelog in cover letter and that you'd sent it to Greg meant for inclusion meant *no new patches had been added*.
Add to that the fact I rushed through them because I saw that Greg has already queued these before anyone had any time to actually review the posting. Hence the timing of the release of unreviewed patches has been taken completely out of our control, and so I rushed through them and misinterpreted what I was seeing.
That's not how the review process is supposed to work. You need to wait for people to review the changes and ACK them before then asking for them to be merged into the stable trees. You need to have changelogs in your summary letters. You need to do all the things that you've been complaining bitterly about over the past month that upstream developers weren't doing 18 months ago.
I thought these had already been reviewed, which is why I took them.
And there still are days before these go anywhere, just adding them to the stable queue doesn't mean they are "set in stone".
Heck, even if they do get merged into a stable release, 'git revert' is our friend here, and we can easily revert anything that is found to be wrong.
And I notice that you've already sent out yet another set of stable patches for review despite the paint not even being dry on these ones. Not to mention that there's a another set of different 5.15 stable patches out for review as well.
This is not a sustainable process.
Seriously: slow down and stop being so damn aggressive. Let everyone catch up and build sustainable processes and timetables. If you keep going like this, you are going break people.
What am I supposed to do here, not take patches you all send me? Wait X number of days?
totally confused,
I think the above was addressing me. I should be managing the review and grace period of xfs stable candidates for 5.10 and should adapt my rhythm to the xfs developers requests.
When I send patches to stable, they are supposed to be good to go, so you should not worry about that.
The patches in this posting are according to xfs developers suggestion as elaborated in the cover letter, but there was a breakage in my process that caused this alarm.
I am going to fix it going forward.
Thanks, Amir.
On Tue, Jun 7, 2022 at 6:01 AM Dave Chinner david@fromorbit.com wrote:
On Tue, Jun 07, 2022 at 01:33:06AM +0300, Amir Goldstein wrote:
On Tue, Jun 7, 2022 at 12:30 AM Dave Chinner david@fromorbit.com wrote:
On Mon, Jun 06, 2022 at 05:32:55PM +0300, Amir Goldstein wrote:
From: Dave Chinner dchinner@redhat.com
commit 56486f307100e8fc66efa2ebd8a71941fa10bf6f upstream.
xfs/538 on a 1kB block filesystem failed with this assert:
XFS: Assertion failed: cur->bc_btnum != XFS_BTNUM_BMAP || cur->bc_ino.allocated == 0 || xfs_is_shutdown(cur->bc_mp), file: fs/xfs/libxfs/xfs_btree.c, line: 448
You haven't mentioned that you combined a second upstream commit into this patch to fix the bug in this commit.....
I am confused.
patch [5.10 7/8] xfs: consider shutdown in bmapbt cursor delete assert is the patch that I backported from 5.12 and posted for review. This patch [5.10 8/8] is the patch from 5.19-rc1 that you pointed out that I should take to fix the bug in patch [5.10 7/8].
Sorry, I missed that this was a new patch because the set looked the same as the last posting and you said in the summary letter:
"These fixes have been posted to review on xfs list [1]."
Sorry, I forgot to edit this part of the template.
Except this patch *wasn't part of that set*. I mistook it for the patch that introduced the assert because I assumed from the above statement, the absence of a changelog in cover letter and that you'd sent it to Greg meant for inclusion meant *no new patches had been added*.
Add to that the fact I rushed through them because I saw that Greg has already queued these before anyone had any time to actually review the posting. Hence the timing of the release of unreviewed patches has been taken completely out of our control, and so I rushed through them and misinterpreted what I was seeing.
That's not how the review process is supposed to work. You need to wait for people to review the changes and ACK them before then asking for them to be merged into the stable trees. You need to have changelogs in your summary letters. You need to do all the things that you've been complaining bitterly about over the past month that upstream developers weren't doing 18 months ago.
Of course I need to do all things. If I am not doing them it could be because I made a mistake or misunderstood something. I am always trying to improve and incorporate feedback on my mistakes.
Regarding changelogs, I do not understand what you mean. Isn't that a changelog at the bottom of my cover letter? Do you mean something else?
Regarding explicit ACKs, I wasn't sure what to do. Ted has asked this on this thread [1].
[1] https://lore.kernel.org/linux-fsdevel/YieG8rZkgnfwygyu@mit.edu/
I asked this in my reply [2] to Darrick's email, but got no reply:
:Should we post to xfs list and wait for explicit ACK/RVB on every patch? :Should we post to xfs list and if no objections are raised post to stable?
[2] https://lore.kernel.org/linux-xfs/CAOQ4uxjtOJ_=65MXVv0Ry0Z224dBxeLJ44FB_O-Nr...
Since I had no explicit rules, I used my common sense, which is a recipe for misunderstandings... :-/
I posted the candidates one week ago, so I thought everyone had the opportunity to comment. You gave me comments on patches 1 and 7 so I had assumed that you had seen the entire series and had no objections to the rest.
I incorporated your feedback and wrote my plans in this email [3]
:Just to make sure we are all on the same page. : :I have applied both patches to my test tree: :1. 1cd738b13ae9 xfs: consider shutdown in bmapbt cursor delete assert :2. 56486f307100 xfs: assert in xfs_btree_del_cursor should take into :account error : :Patch #2 looks pretty safe and it only affects builds with XFS_WARN/DEBUG, :so I am not too concerned about a soaking period. :I plan to send it along with patch #1 to stable after a few more test runs.
Once again, I *assumed* that you saw that because this was part of an ongoing conversation, not some random email.
4 days later (after more testing) I did what I said I would do and posted the revised series to stable with feedback incorporated. This was also detailed in the cover letter to stable.
[3] https://lore.kernel.org/linux-xfs/CAOQ4uxhxLRTUfyhSy9D6nsGdVANrUOgRM8msVPVmF...
If this situation repeats itself in the future, I will post v2 to xfs list first.
And I notice that you've already sent out yet another set of stable patches for review despite the paint not even being dry on these ones. Not to mention that there's a another set of different 5.15 stable patches out for review as well.
I will pace myself going forward and collaborate closer with Leah. I have two years of kernel releases to catch up with, but once we reach the point of selecting patches from the present releases Hopefully, some of the reviews for candidates for different LTS kernels will be shared.
This is not a sustainable process.
Seriously: slow down and stop being so damn aggressive. Let everyone catch up and build sustainable processes and timetables. If you keep going like this, you are going break people.
I do not want that.
Let me explain my process. As I described in the meta-cover letter [4] for the multi part series:
:My intention is to post the parts for review on the xfs list on :a ~weekly basis and forward them to stable only after xfs developers :have had the chance to review the selection.
[4] https://lore.kernel.org/linux-xfs/20220525111715.2769700-1-amir73il@gmail.co...
To you, it may appear that "paint not even being dry on these ones" but to me, I perceived part 2 was already out of review and part 3 was already spinning in xfstests for a week, so I wanted to post those patches and give as much time for review as needed.
My idea of sustainable was posting ~7 stable candidates per week. This pace may be a bit higher than normal fixes flow, but I do need to catch up with 2 years of fixes, so the rate has to be a bit higher than the normal rate of fixes that go into upstream.
I had to choose something based on no other feedback, but of course the idea is to take feedback and incorporate it into the process in order to make it sustainable.
I will do my best to amend the things that were broken in this posting. I am sure this is not the last time I am going to make mistakes. I am trying to fix a process that has been broken for years. I invest a lot of my time and energy in this effort.
My request is that you assume good intentions on my part and if there are rules that you want me to follow please spell them out, so I won't end up guessing wrong again.
Thanks, Amir.
On Tue, Jun 07, 2022 at 09:09:47AM +0300, Amir Goldstein wrote:
On Tue, Jun 7, 2022 at 6:01 AM Dave Chinner david@fromorbit.com wrote:
On Tue, Jun 07, 2022 at 01:33:06AM +0300, Amir Goldstein wrote:
On Tue, Jun 7, 2022 at 12:30 AM Dave Chinner david@fromorbit.com wrote:
On Mon, Jun 06, 2022 at 05:32:55PM +0300, Amir Goldstein wrote:
From: Dave Chinner dchinner@redhat.com
commit 56486f307100e8fc66efa2ebd8a71941fa10bf6f upstream.
xfs/538 on a 1kB block filesystem failed with this assert:
XFS: Assertion failed: cur->bc_btnum != XFS_BTNUM_BMAP || cur->bc_ino.allocated == 0 || xfs_is_shutdown(cur->bc_mp), file: fs/xfs/libxfs/xfs_btree.c, line: 448
You haven't mentioned that you combined a second upstream commit into this patch to fix the bug in this commit.....
I am confused.
patch [5.10 7/8] xfs: consider shutdown in bmapbt cursor delete assert is the patch that I backported from 5.12 and posted for review. This patch [5.10 8/8] is the patch from 5.19-rc1 that you pointed out that I should take to fix the bug in patch [5.10 7/8].
Sorry, I missed that this was a new patch because the set looked the same as the last posting and you said in the summary letter:
"These fixes have been posted to review on xfs list [1]."
Sorry, I forgot to edit this part of the template.
Except this patch *wasn't part of that set*. I mistook it for the patch that introduced the assert because I assumed from the above statement, the absence of a changelog in cover letter and that you'd sent it to Greg meant for inclusion meant *no new patches had been added*.
Add to that the fact I rushed through them because I saw that Greg has already queued these before anyone had any time to actually review the posting. Hence the timing of the release of unreviewed patches has been taken completely out of our control, and so I rushed through them and misinterpreted what I was seeing.
That's not how the review process is supposed to work. You need to wait for people to review the changes and ACK them before then asking for them to be merged into the stable trees. You need to have changelogs in your summary letters. You need to do all the things that you've been complaining bitterly about over the past month that upstream developers weren't doing 18 months ago.
Of course I need to do all things. If I am not doing them it could be because I made a mistake or misunderstood something. I am always trying to improve and incorporate feedback on my mistakes.
One thing I've noticed watching the candidate series going by on the list -- is there something consistent that could go in the subject line of a candidate series from the first posting all the way until the stable maintainers queue them up?
"[PATCH 5.10 v2 0/8] xfs fixes for 5.10.y (part 2)"
isn't quite specific enough to make it easy to find the threads in mutt, at least not if they're all called "xfs fixes for 5.10.y". I'm not sure what to suggest here though -- pick two random dictionary words?
"xfs: fixes for 5.10.y (trews sphenic)"
But that just looks like your $pet walked over the keyboard. Maybe something boring like the date the candidate series was first posted?
Regarding changelogs, I do not understand what you mean. Isn't that a changelog at the bottom of my cover letter? Do you mean something else?
I /think/ Dave wanted to see something at the bottom of the cover letter like this. Actually, I won't speak for Dave, but this is what I'd like to see if you make substantive changes between the CANDIDATE patchset and the one that actually gets sent to -stable:
"I would like to thank Samsung for contributing the hardware for the kdevops test environment and especially to Luis for his ongoing support in the test environment, which does most of the work for me :)
"v3: Change frob to brof, per maintainer request. v2: Add patch 7, which it turns out was a hidden prerequisite for patch 8.
"Thanks, Amir."
Regarding explicit ACKs, I wasn't sure what to do.
Me neither. It feels a little funny to ACK a patch in a stable queue that I already RVB'd for upstream, but is that peoples' expectation?
Ted has asked this on this thread [1].
[1] https://lore.kernel.org/linux-fsdevel/YieG8rZkgnfwygyu@mit.edu/
I asked this in my reply [2] to Darrick's email, but got no reply:
:Should we post to xfs list and wait for explicit ACK/RVB on every patch? :Should we post to xfs list and if no objections are raised post to stable?
[2] https://lore.kernel.org/linux-xfs/CAOQ4uxjtOJ_=65MXVv0Ry0Z224dBxeLJ44FB_O-Nr...
TBH there have been enough stable process discussion threads in the past month that I've probably lost track of ... well, clearly, those two. :/
Since I had no explicit rules, I used my common sense, which is a recipe for misunderstandings... :-/
I posted the candidates one week ago, so I thought everyone had the opportunity to comment. You gave me comments on patches 1 and 7 so I had assumed that you had seen the entire series and had no objections to the rest.
I incorporated your feedback and wrote my plans in this email [3]
I'm going to offer my (probably controversial) opinion here: I would like to delegate /all/ of the power and responsibility for stable maintenance to all of you (Amir/Leah/Chandan/etc.) who are doing the work. What Amir did here (send a candidate patchset, take suggestions for a week, run the batch through fstests) is more or less what I'd expect from whoever owns the LTS backport process.
Frankly, I don't see so much difference between what I do between -rc1 and -rc4 and what Amir's doing -- propose a bunch of fixpatches, wait a few days, and if there aren't any strenuous objections, send them on their way. IOWS, I own whatever's going to the upstream tree; Amir owns whatever's going to 5.10; Leah and Chandan own whatever's going to 5.15; and (I guess?) Chandan owns whatever's going to 5.4.
I think Dave's afraid that if something goes wrong with an LTS kernel then the downstream consumers of those LTS kernels will show up on the list with bug reports and asking for fixes, and that will just put more pressure on the upstream maintainers since those consumers don't care about who they're talking to, they just want a resolution. But I'll let him express his thoughts.
I've been pondering this overnight, and I don't agree that the above scenario is the inevitable outcome. Are you (LTS branch maintainers) willing to put your names in the MAINTAINERS file for the LTS kernels and let us (upstream maintainers) point downstream consumers (and their bug report) of those branches at you? If so, then I see that as effective delegation of responsibilities to people who have taken ownership of the LTS branches, not merely blame shifting.
If yes, then ... do as you like, you're the branch owner. I expect things to be rocky for a while, but it beats burning myself out with too many responsibilities that I have not been able to keep up with. It's probably better for the long term stability of each LTS branch if there's one or two people who are really familiar with how that LTS has been doing, instead of me trying and failing to multiplex.
Also, as Greg points out, at worst, you /can/ decide to revert something that initially looked good but later turned out smelly.
The one thing I /will/ caution you about -- watch out for changes that affect what gets persisted to disk. Those need more review because fixing those things up after the fact (look at the AGFL padding corrections we had to do to fix some uncaught problems upstream) is /not/ fun.
:Just to make sure we are all on the same page. : :I have applied both patches to my test tree: :1. 1cd738b13ae9 xfs: consider shutdown in bmapbt cursor delete assert :2. 56486f307100 xfs: assert in xfs_btree_del_cursor should take into :account error : :Patch #2 looks pretty safe and it only affects builds with XFS_WARN/DEBUG, :so I am not too concerned about a soaking period. :I plan to send it along with patch #1 to stable after a few more test runs.
Once again, I *assumed* that you saw that because this was part of an ongoing conversation, not some random email.
4 days later (after more testing) I did what I said I would do and posted the revised series to stable with feedback incorporated. This was also detailed in the cover letter to stable.
[3] https://lore.kernel.org/linux-xfs/CAOQ4uxhxLRTUfyhSy9D6nsGdVANrUOgRM8msVPVmF...
If this situation repeats itself in the future, I will post v2 to xfs list first.
Yes, please do. It has been customary to push a final patchset for 24h prior to submission just to see if anything shakes out at the last minute.
And I notice that you've already sent out yet another set of stable patches for review despite the paint not even being dry on these ones. Not to mention that there's a another set of different 5.15 stable patches out for review as well.
I will pace myself going forward and collaborate closer with Leah. I have two years of kernel releases to catch up with, but once we reach the point of selecting patches from the present releases Hopefully, some of the reviews for candidates for different LTS kernels will be shared.
Admittedly, a new patchset every week is kind of a lot to pick through with everything /else/ going on ... particularly this week, since I /was/ busy trying to get the online fsck design doc review started, and dealing with a CVE for the grub xfs driver, and also trying to get the new LARP/NREXT64 itself sorted. That said, a couple of years is a lot of stuff to get through, so if we need to do a continuous trickle of patches to get caught up then so be it.
As long as you all end up doing a better job with LTS maintenance than I was doing, it's an improvement, even with the lumps and bumps. :)
This is not a sustainable process.
Seriously: slow down and stop being so damn aggressive. Let everyone catch up and build sustainable processes and timetables. If you keep going like this, you are going break people.
I do not want that.
Let me explain my process. As I described in the meta-cover letter [4] for the multi part series:
:My intention is to post the parts for review on the xfs list on :a ~weekly basis and forward them to stable only after xfs developers :have had the chance to review the selection.
[4] https://lore.kernel.org/linux-xfs/20220525111715.2769700-1-amir73il@gmail.co...
To you, it may appear that "paint not even being dry on these ones" but to me, I perceived part 2 was already out of review and part 3 was already spinning in xfstests for a week, so I wanted to post those patches and give as much time for review as needed.
My idea of sustainable was posting ~7 stable candidates per week. This pace may be a bit higher than normal fixes flow, but I do need to catch up with 2 years of fixes, so the rate has to be a bit higher than the normal rate of fixes that go into upstream.
Pipelining is a bit more troublesome -- it's difficult for me to concentrate on two similarly named patchsets at the same time. If you have (say) part 3 running through its paces internally but only post part 3 after Greg lands part 2, that sounds acceptable to me.
I had to choose something based on no other feedback, but of course the idea is to take feedback and incorporate it into the process in order to make it sustainable.
I will do my best to amend the things that were broken in this posting. I am sure this is not the last time I am going to make mistakes. I am trying to fix a process that has been broken for years. I invest a lot of my time and energy in this effort.
My request is that you assume good intentions on my part and if there are rules that you want me to follow please spell them out, so I won't end up guessing wrong again.
I know, and thank you all for your willingness to take on LTS maintainership. There will undoubtedly be more, uh, learning opportunities, but I'll try to roll with them as gently as I can. Emotional self-regulation has not been one of my strong points the last year or so.
--D
Thanks, Amir.
On Tue, Jun 07, 2022 at 11:35:50AM -0700, Darrick J. Wong wrote:
On Tue, Jun 07, 2022 at 09:09:47AM +0300, Amir Goldstein wrote:
Regarding explicit ACKs, I wasn't sure what to do.
Me neither. It feels a little funny to ACK a patch in a stable queue that I already RVB'd for upstream, but is that peoples' expectation?
I think that's up to us, in particular because we don't want bots to do this work for XFS, and so we must define what we feel comfortable with.
How about this: having at least one XFS maintainer Ack each of the patches for the intent of getting into stable? If no Acks come back by *2 weeks* the stable branch maintainers can use their discretion to send upstream to the stable team.
I incorporated your feedback and wrote my plans in this email [3]
I'm going to offer my (probably controversial) opinion here: I would like to delegate /all/ of the power and responsibility for stable maintenance to all of you (Amir/Leah/Chandan/etc.) who are doing the work. What Amir did here (send a candidate patchset, take suggestions for a week, run the batch through fstests) is more or less what I'd expect from whoever owns the LTS backport process.
I'm happy to provide advice as a paranoid developer who has seen the incredibly odd things come up before and had to deal with them. People can either ignore it or take it.
I've been pondering this overnight, and I don't agree that the above scenario is the inevitable outcome. Are you (LTS branch maintainers) willing to put your names in the MAINTAINERS file for the LTS kernels and let us (upstream maintainers) point downstream consumers (and their bug report) of those branches at you? If so, then I see that as effective delegation of responsibilities to people who have taken ownership of the LTS branches, not merely blame shifting.
*This* is why when I volunteered to do xfs stable work a long time ago my own bar for regression testing was and still remains *very high*. You really need to put the fear in $deity in terms of responsibility because without that, it is not fair to upstream developers for what loose cannons there are for stable candidate patches for a filesystem. As anyone doing "enterprise" could attest to, *that* work alone requires a lot of time, and so one can't realistically multitask to both.
It is also why this work stopped eventually, because I lost my test rig after one $EMPLOYER change and my focus was more on the "enterprise" side at SUSE. It is why after yet another $EMPLOYER change this remains a long priority, and I try to do what I can to help with this.
If we care about stable we need to seriously consider a scalable solution which *includes* testing. And, I also think even the "bot" enabled stable fixed filesystem could benefit from these best practices as well.
If yes, then ... do as you like, you're the branch owner. I expect things to be rocky for a while, but it beats burning myself out with too many responsibilities that I have not been able to keep up with. It's probably better for the long term stability of each LTS branch if there's one or two people who are really familiar with how that LTS has been doing, instead of me trying and failing to multiplex.
Also, as Greg points out, at worst, you /can/ decide to revert something that initially looked good but later turned out smelly.
If testing is paranoid the chances of these reverts are reduced. The reason for paranoia is reasonably higher for filesystems since we don't want to break a user's filesystem. It is up to each stable branch maintainer to decide their level of paranoia, as they incur the responsibility for the branch.
Luis
Of course I need to do all things. If I am not doing them it could be because I made a mistake or misunderstood something. I am always trying to improve and incorporate feedback on my mistakes.
One thing I've noticed watching the candidate series going by on the list -- is there something consistent that could go in the subject line of a candidate series from the first posting all the way until the stable maintainers queue them up?
"[PATCH 5.10 v2 0/8] xfs fixes for 5.10.y (part 2)"
isn't quite specific enough to make it easy to find the threads in mutt, at least not if they're all called "xfs fixes for 5.10.y". I'm not sure what to suggest here though -- pick two random dictionary words?
"xfs: fixes for 5.10.y (trews sphenic)"
But that just looks like your $pet walked over the keyboard. Maybe something boring like the date the candidate series was first posted?
That's good feedback!
In my tree this part 2 is tag xfs-5.10.y-2 which is annotated as:
5.10.y xfs backports circa 5.12
I can go with something along those lines. It will not always be possible to use the origin release as a description, but the word backports is pretty descriptive and should not collide with anything else going through xfs list.
I was also pondering how to reflect the transition from candidate to stable posting.
Note that from part 1 to part 2 I added CANDIDATE to the review post.
The choice of going with v2 for stable posting is just the best idea I came up with, not sure if it is a good idea. I would certainly have been better if I had done the changelog since v1. In retrospect, I should have posted v2 CANDIDATE and only then posted to stable.
Regarding changelogs, I do not understand what you mean. Isn't that a changelog at the bottom of my cover letter? Do you mean something else?
I /think/ Dave wanted to see something at the bottom of the cover letter like this. Actually, I won't speak for Dave, but this is what I'd like to see if you make substantive changes between the CANDIDATE patchset and the one that actually gets sent to -stable:
"I would like to thank Samsung for contributing the hardware for the kdevops test environment and especially to Luis for his ongoing support in the test environment, which does most of the work for me :)
"v3: Change frob to brof, per maintainer request. v2: Add patch 7, which it turns out was a hidden prerequisite for patch 8.
Gosh! of course. Total blackout.
I had a mental perception of a RESEND that drops the CANDIDATE tag, so I completely forgot about that.
The clear mistake here was to not post CANDIDATE v2 to xfs. And it is not out of laziness or eagerness that I didn't post v2. I was trying to reduce the already loud noise that this effort is causing to a minimum, but in retrospect, the risk of creating friction due to misunderstanding was not worth it.
Regarding explicit ACKs, I wasn't sure what to do.
Me neither. It feels a little funny to ACK a patch in a stable queue that I already RVB'd for upstream, but is that peoples' expectation?
My goal is not to take away control from xfs maintainers. My goal is to lower the burden on xfs maintainers. It would seem to me that requiring explicit ACK per patch is not the formula for reducing burden.
Luis suggested the earlier of explicit ACK (can also be on the series) and two weeks of grace on xfs list.
That sounds like a good balance to me. But rest assured, if there is a patch that is more than trivial, and I am not sure about, I would ping the relevant developer myself and not trust the tests alone.
Ted has asked this on this thread [1].
[1] https://lore.kernel.org/linux-fsdevel/YieG8rZkgnfwygyu@mit.edu/
I asked this in my reply [2] to Darrick's email, but got no reply:
:Should we post to xfs list and wait for explicit ACK/RVB on every patch? :Should we post to xfs list and if no objections are raised post to stable?
[2] https://lore.kernel.org/linux-xfs/CAOQ4uxjtOJ_=65MXVv0Ry0Z224dBxeLJ44FB_O-Nr...
TBH there have been enough stable process discussion threads in the past month that I've probably lost track of ... well, clearly, those two. :/
Since I had no explicit rules, I used my common sense, which is a recipe for misunderstandings... :-/
I posted the candidates one week ago, so I thought everyone had the opportunity to comment. You gave me comments on patches 1 and 7 so I had assumed that you had seen the entire series and had no objections to the rest.
I incorporated your feedback and wrote my plans in this email [3]
I'm going to offer my (probably controversial) opinion here: I would like to delegate /all/ of the power and responsibility for stable maintenance to all of you (Amir/Leah/Chandan/etc.) who are doing the work. What Amir did here (send a candidate patchset, take suggestions for a week, run the batch through fstests) is more or less what I'd expect from whoever owns the LTS backport process.
Frankly, I don't see so much difference between what I do between -rc1 and -rc4 and what Amir's doing -- propose a bunch of fixpatches, wait a few days, and if there aren't any strenuous objections, send them on their way. IOWS, I own whatever's going to the upstream tree; Amir owns whatever's going to 5.10; Leah and Chandan own whatever's going to 5.15; and (I guess?) Chandan owns whatever's going to 5.4.
I think Dave's afraid that if something goes wrong with an LTS kernel then the downstream consumers of those LTS kernels will show up on the list with bug reports and asking for fixes, and that will just put more pressure on the upstream maintainers since those consumers don't care about who they're talking to, they just want a resolution. But I'll let him express his thoughts.
I've been pondering this overnight, and I don't agree that the above scenario is the inevitable outcome. Are you (LTS branch maintainers) willing to put your names in the MAINTAINERS file for the LTS kernels and let us (upstream maintainers) point downstream consumers (and their bug report) of those branches at you? If so, then I see that as effective delegation of responsibilities to people who have taken ownership of the LTS branches, not merely blame shifting.
Yes, I am willing to take that responsibility. I curated the 5.10 backports, signed them and tested them. I see them as my responsibility to LTS users whether you list me in MAINTAINERS or you don't.
I can commit to being responsive to reports from LTS users. I do want to state, though, that my ability to continue to validate and post backports depends on the resources that Samsung provided Luis and me. If that changes in the future, we will need to re-assess.
If yes, then ... do as you like, you're the branch owner. I expect things to be rocky for a while, but it beats burning myself out with too many responsibilities that I have not been able to keep up with. It's probably better for the long term stability of each LTS branch if there's one or two people who are really familiar with how that LTS has been doing, instead of me trying and failing to multiplex.
Also, as Greg points out, at worst, you /can/ decide to revert something that initially looked good but later turned out smelly.
The one thing I /will/ caution you about -- watch out for changes that affect what gets persisted to disk. Those need more review because fixing those things up after the fact (look at the AGFL padding corrections we had to do to fix some uncaught problems upstream) is /not/ fun.
Duly noted.
Speaking on my behalf, the candidates that I post go through careful manual inspection, which takes the wider context into account. I have been following the xfs list for over 5 years, before starting to do this work and the backport work started top down from a complete overview of everything that got merged since v5.10.
:Just to make sure we are all on the same page. : :I have applied both patches to my test tree: :1. 1cd738b13ae9 xfs: consider shutdown in bmapbt cursor delete assert :2. 56486f307100 xfs: assert in xfs_btree_del_cursor should take into :account error : :Patch #2 looks pretty safe and it only affects builds with XFS_WARN/DEBUG, :so I am not too concerned about a soaking period. :I plan to send it along with patch #1 to stable after a few more test runs.
Once again, I *assumed* that you saw that because this was part of an ongoing conversation, not some random email.
4 days later (after more testing) I did what I said I would do and posted the revised series to stable with feedback incorporated. This was also detailed in the cover letter to stable.
[3] https://lore.kernel.org/linux-xfs/CAOQ4uxhxLRTUfyhSy9D6nsGdVANrUOgRM8msVPVmF...
If this situation repeats itself in the future, I will post v2 to xfs list first.
Yes, please do. It has been customary to push a final patchset for 24h prior to submission just to see if anything shakes out at the last minute.
ok.
And I notice that you've already sent out yet another set of stable patches for review despite the paint not even being dry on these ones. Not to mention that there's a another set of different 5.15 stable patches out for review as well.
I will pace myself going forward and collaborate closer with Leah. I have two years of kernel releases to catch up with, but once we reach the point of selecting patches from the present releases Hopefully, some of the reviews for candidates for different LTS kernels will be shared.
Admittedly, a new patchset every week is kind of a lot to pick through with everything /else/ going on ... particularly this week, since I /was/ busy trying to get the online fsck design doc review started, and dealing with a CVE for the grub xfs driver, and also trying to get the new LARP/NREXT64 itself sorted. That said, a couple of years is a lot of stuff to get through, so if we need to do a continuous trickle of patches to get caught up then so be it.
As long as you all end up doing a better job with LTS maintenance than I was doing, it's an improvement, even with the lumps and bumps. :)
This is not a sustainable process.
Seriously: slow down and stop being so damn aggressive. Let everyone catch up and build sustainable processes and timetables. If you keep going like this, you are going break people.
I do not want that.
Let me explain my process. As I described in the meta-cover letter [4] for the multi part series:
:My intention is to post the parts for review on the xfs list on :a ~weekly basis and forward them to stable only after xfs developers :have had the chance to review the selection.
[4] https://lore.kernel.org/linux-xfs/20220525111715.2769700-1-amir73il@gmail.co...
To you, it may appear that "paint not even being dry on these ones" but to me, I perceived part 2 was already out of review and part 3 was already spinning in xfstests for a week, so I wanted to post those patches and give as much time for review as needed.
My idea of sustainable was posting ~7 stable candidates per week. This pace may be a bit higher than normal fixes flow, but I do need to catch up with 2 years of fixes, so the rate has to be a bit higher than the normal rate of fixes that go into upstream.
Pipelining is a bit more troublesome -- it's difficult for me to concentrate on two similarly named patchsets at the same time. If you have (say) part 3 running through its paces internally but only post part 3 after Greg lands part 2, that sounds acceptable to me.
That makes sense. Looks like part 4 is going to be cooked "well done" by the time I post it ;-) Luis' kdevops is gaining more stability as time goes by. I can sometimes leave it running for days without attending to sporadic testing issues.
I had to choose something based on no other feedback, but of course the idea is to take feedback and incorporate it into the process in order to make it sustainable.
I will do my best to amend the things that were broken in this posting. I am sure this is not the last time I am going to make mistakes. I am trying to fix a process that has been broken for years. I invest a lot of my time and energy in this effort.
My request is that you assume good intentions on my part and if there are rules that you want me to follow please spell them out, so I won't end up guessing wrong again.
I know, and thank you all for your willingness to take on LTS maintainership. There will undoubtedly be more, uh, learning opportunities, but I'll try to roll with them as gently as I can. Emotional self-regulation has not been one of my strong points the last year or so.
Thank you for this email. It helps a lot to clarify the process!
Amir.
linux-stable-mirror@lists.linaro.org