Sasha,
you merged my last set of XFS fixes. I asked for one patch to not be merged yet as one issue was not yet properly fixed. After some further review I have identified commits which do fix the kernel crash reported on kz#204223 [0] with generic/388, this patch set applies on top of the last one I sent you.
These commits do quite a bit of code refactoring, and the actual fix lies hidden in the last commit by Darrick. Due to the amount of changes trying to extract the fix is riskier than just carring the code refactoring. If we're OK with the code refactor for stable, its my recommendation we keep the changes to match more with upstream and benefit from other fixes. The code refactoring was merged on v4.20 and Darrick's fix is the only fix upstream since the code was merged.
If others disagree with this approach please speak up.
I've run a full set of fstests against the following sections 12 times and have found no regressions against the baseline:
xfs xfs_logdev xfs_nocrc_512 xfs_nocrc xfs_realtimedev xfs_reflink_1024 xfs_reflink_dev
Review from others is appreciated.
[0] https://bugzilla.kernel.org/show_bug.cgi?id=204223
Allison Henderson (4): xfs: Move fs/xfs/xfs_attr.h to fs/xfs/libxfs/xfs_attr.h xfs: Add helper function xfs_attr_try_sf_addname xfs: Add attibute set and helper functions xfs: Add attibute remove and helper functions
Brian Foster (1): xfs: don't trip over uninitialized buffer on extent read of corrupted inode
Darrick J. Wong (1): xfs: always rejoin held resources during defer roll
fs/xfs/libxfs/xfs_attr.c | 231 ++++++++++++++++++--------------- fs/xfs/{ => libxfs}/xfs_attr.h | 2 + fs/xfs/libxfs/xfs_bmap.c | 54 +++++--- fs/xfs/libxfs/xfs_bmap.h | 1 + fs/xfs/libxfs/xfs_defer.c | 14 +- fs/xfs/xfs_dquot.c | 17 +-- 6 files changed, 183 insertions(+), 136 deletions(-) rename fs/xfs/{ => libxfs}/xfs_attr.h (98%)
From: Brian Foster bfoster@redhat.com
commit 6958d11f77d45db80f7e22a21a74d4d5f44dc667 upstream.
We've had rather rare reports of bmap btree block corruption where the bmap root block has a level count of zero. The root cause of the corruption is so far unknown. We do have verifier checks to detect this form of on-disk corruption, but this doesn't cover a memory corruption variant of the problem. The latter is a reasonable possibility because the root block is part of the inode fork and can reside in-core for some time before inode extents are read.
If this occurs, it leads to a system crash such as the following:
BUG: unable to handle kernel paging request at ffffffff00000221 PF error: [normal kernel read fault] ... RIP: 0010:xfs_trans_brelse+0xf/0x200 [xfs] ... Call Trace: xfs_iread_extents+0x379/0x540 [xfs] xfs_file_iomap_begin_delay+0x11a/0xb40 [xfs] ? xfs_attr_get+0xd1/0x120 [xfs] ? iomap_write_begin.constprop.40+0x2d0/0x2d0 xfs_file_iomap_begin+0x4c4/0x6d0 [xfs] ? __vfs_getxattr+0x53/0x70 ? iomap_write_begin.constprop.40+0x2d0/0x2d0 iomap_apply+0x63/0x130 ? iomap_write_begin.constprop.40+0x2d0/0x2d0 iomap_file_buffered_write+0x62/0x90 ? iomap_write_begin.constprop.40+0x2d0/0x2d0 xfs_file_buffered_aio_write+0xe4/0x3b0 [xfs] __vfs_write+0x150/0x1b0 vfs_write+0xba/0x1c0 ksys_pwrite64+0x64/0xa0 do_syscall_64+0x5a/0x1d0 entry_SYSCALL_64_after_hwframe+0x49/0xbe
The crash occurs because xfs_iread_extents() attempts to release an uninitialized buffer pointer as the level == 0 value prevented the buffer from ever being allocated or read. Change the level > 0 assert to an explicit error check in xfs_iread_extents() to avoid crashing the kernel in the event of localized, in-core inode corruption.
Signed-off-by: Brian Foster bfoster@redhat.com Reviewed-by: Darrick J. Wong darrick.wong@oracle.com Signed-off-by: Darrick J. Wong darrick.wong@oracle.com Signed-off-by: Luis Chamberlain mcgrof@kernel.org --- fs/xfs/libxfs/xfs_bmap.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c index 3a496ffe6551..ab2465bc413a 100644 --- a/fs/xfs/libxfs/xfs_bmap.c +++ b/fs/xfs/libxfs/xfs_bmap.c @@ -1178,7 +1178,10 @@ xfs_iread_extents( * Root level must use BMAP_BROOT_PTR_ADDR macro to get ptr out. */ level = be16_to_cpu(block->bb_level); - ASSERT(level > 0); + if (unlikely(level == 0)) { + XFS_ERROR_REPORT(__func__, XFS_ERRLEVEL_LOW, mp); + return -EFSCORRUPTED; + } pp = XFS_BMAP_BROOT_PTR_ADDR(mp, block, 1, ifp->if_broot_bytes); bno = be64_to_cpu(*pp);
From: Allison Henderson allison.henderson@oracle.com
commit e2421f0b5ff3ce279573036f5cfcb0ce28b422a9 upstream.
This patch moves fs/xfs/xfs_attr.h to fs/xfs/libxfs/xfs_attr.h since xfs_attr.c is in libxfs. We will need these later in xfsprogs.
Signed-off-by: Allison Henderson allison.henderson@oracle.com Reviewed-by: Dave Chinner dchinner@redhat.com Signed-off-by: Dave Chinner david@fromorbit.com Signed-off-by: Luis Chamberlain mcgrof@kernel.org --- fs/xfs/{ => libxfs}/xfs_attr.h | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename fs/xfs/{ => libxfs}/xfs_attr.h (100%)
diff --git a/fs/xfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h similarity index 100% rename from fs/xfs/xfs_attr.h rename to fs/xfs/libxfs/xfs_attr.h
From: Allison Henderson allison.henderson@oracle.com
commit 4c74a56b9de76bb6b581274b76b52535ad77c2a7 upstream.
This patch adds a subroutine xfs_attr_try_sf_addname used by xfs_attr_set. This subrotine will attempt to add the attribute name specified in args in shortform, as well and perform error handling previously done in xfs_attr_set.
This patch helps to pre-simplify xfs_attr_set for reviewing purposes and reduce indentation. New function will be added in the next patch.
[dgc: moved commit to helper function, too.]
Signed-off-by: Allison Henderson allison.henderson@oracle.com Reviewed-by: Dave Chinner dchinner@redhat.com Signed-off-by: Dave Chinner david@fromorbit.com Signed-off-by: Luis Chamberlain mcgrof@kernel.org --- fs/xfs/libxfs/xfs_attr.c | 53 +++++++++++++++++++++++----------------- 1 file changed, 30 insertions(+), 23 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c index c6299f82a6e4..c15a1debec90 100644 --- a/fs/xfs/libxfs/xfs_attr.c +++ b/fs/xfs/libxfs/xfs_attr.c @@ -191,6 +191,33 @@ xfs_attr_calc_size( return nblks; }
+STATIC int +xfs_attr_try_sf_addname( + struct xfs_inode *dp, + struct xfs_da_args *args) +{ + + struct xfs_mount *mp = dp->i_mount; + int error, error2; + + error = xfs_attr_shortform_addname(args); + if (error == -ENOSPC) + return error; + + /* + * Commit the shortform mods, and we're done. + * NOTE: this is also the error path (EEXIST, etc). + */ + if (!error && (args->flags & ATTR_KERNOTIME) == 0) + xfs_trans_ichgtime(args->trans, dp, XFS_ICHGTIME_CHG); + + if (mp->m_flags & XFS_MOUNT_WSYNC) + xfs_trans_set_sync(args->trans); + + error2 = xfs_trans_commit(args->trans); + return error ? error : error2; +} + int xfs_attr_set( struct xfs_inode *dp, @@ -204,7 +231,7 @@ xfs_attr_set( struct xfs_da_args args; struct xfs_trans_res tres; int rsvd = (flags & ATTR_ROOT) != 0; - int error, err2, local; + int error, local;
XFS_STATS_INC(mp, xs_attr_set);
@@ -281,30 +308,10 @@ xfs_attr_set( * Try to add the attr to the attribute list in * the inode. */ - error = xfs_attr_shortform_addname(&args); + error = xfs_attr_try_sf_addname(dp, &args); if (error != -ENOSPC) { - /* - * Commit the shortform mods, and we're done. - * NOTE: this is also the error path (EEXIST, etc). - */ - ASSERT(args.trans != NULL); - - /* - * If this is a synchronous mount, make sure that - * the transaction goes to disk before returning - * to the user. - */ - if (mp->m_flags & XFS_MOUNT_WSYNC) - xfs_trans_set_sync(args.trans); - - if (!error && (flags & ATTR_KERNOTIME) == 0) { - xfs_trans_ichgtime(args.trans, dp, - XFS_ICHGTIME_CHG); - } - err2 = xfs_trans_commit(args.trans); xfs_iunlock(dp, XFS_ILOCK_EXCL); - - return error ? error : err2; + return error; }
/*
From: Allison Henderson allison.henderson@oracle.com
commit 2f3cd8091963810d85e6a5dd6ed1247e10e9e6f2 upstream.
This patch adds xfs_attr_set_args and xfs_bmap_set_attrforkoff. These sub-routines set the attributes specified in @args. We will use this later for setting parent pointers as a deferred attribute operation.
[dgc: remove attr fork init code from xfs_attr_set_args().] [dgc: xfs_attr_try_sf_addname() NULLs args.trans after commit.] [dgc: correct sf add error handling.]
Signed-off-by: Allison Henderson allison.henderson@oracle.com Reviewed-by: Dave Chinner dchinner@redhat.com Signed-off-by: Dave Chinner david@fromorbit.com Signed-off-by: Luis Chamberlain mcgrof@kernel.org --- fs/xfs/libxfs/xfs_attr.c | 151 +++++++++++++++++++++------------------ fs/xfs/libxfs/xfs_attr.h | 1 + fs/xfs/libxfs/xfs_bmap.c | 49 ++++++++----- fs/xfs/libxfs/xfs_bmap.h | 1 + 4 files changed, 115 insertions(+), 87 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c index c15a1debec90..25431ddba1fa 100644 --- a/fs/xfs/libxfs/xfs_attr.c +++ b/fs/xfs/libxfs/xfs_attr.c @@ -215,9 +215,80 @@ xfs_attr_try_sf_addname( xfs_trans_set_sync(args->trans);
error2 = xfs_trans_commit(args->trans); + args->trans = NULL; return error ? error : error2; }
+/* + * Set the attribute specified in @args. + */ +int +xfs_attr_set_args( + struct xfs_da_args *args, + struct xfs_buf **leaf_bp) +{ + struct xfs_inode *dp = args->dp; + int error; + + /* + * If the attribute list is non-existent or a shortform list, + * upgrade it to a single-leaf-block attribute list. + */ + if (dp->i_d.di_aformat == XFS_DINODE_FMT_LOCAL || + (dp->i_d.di_aformat == XFS_DINODE_FMT_EXTENTS && + dp->i_d.di_anextents == 0)) { + + /* + * Build initial attribute list (if required). + */ + if (dp->i_d.di_aformat == XFS_DINODE_FMT_EXTENTS) + xfs_attr_shortform_create(args); + + /* + * Try to add the attr to the attribute list in the inode. + */ + error = xfs_attr_try_sf_addname(dp, args); + if (error != -ENOSPC) + return error; + + /* + * It won't fit in the shortform, transform to a leaf block. + * GROT: another possible req'mt for a double-split btree op. + */ + error = xfs_attr_shortform_to_leaf(args, leaf_bp); + if (error) + return error; + + /* + * Prevent the leaf buffer from being unlocked so that a + * concurrent AIL push cannot grab the half-baked leaf + * buffer and run into problems with the write verifier. + */ + xfs_trans_bhold(args->trans, *leaf_bp); + + error = xfs_defer_finish(&args->trans); + if (error) + return error; + + /* + * Commit the leaf transformation. We'll need another + * (linked) transaction to add the new attribute to the + * leaf. + */ + error = xfs_trans_roll_inode(&args->trans, dp); + if (error) + return error; + xfs_trans_bjoin(args->trans, *leaf_bp); + *leaf_bp = NULL; + } + + if (xfs_bmap_one_block(dp, XFS_ATTR_FORK)) + error = xfs_attr_leaf_addname(args); + else + error = xfs_attr_node_addname(args); + return error; +} + int xfs_attr_set( struct xfs_inode *dp, @@ -282,73 +353,17 @@ xfs_attr_set( error = xfs_trans_reserve_quota_nblks(args.trans, dp, args.total, 0, rsvd ? XFS_QMOPT_RES_REGBLKS | XFS_QMOPT_FORCE_RES : XFS_QMOPT_RES_REGBLKS); - if (error) { - xfs_iunlock(dp, XFS_ILOCK_EXCL); - xfs_trans_cancel(args.trans); - return error; - } + if (error) + goto out_trans_cancel;
xfs_trans_ijoin(args.trans, dp, 0); - - /* - * If the attribute list is non-existent or a shortform list, - * upgrade it to a single-leaf-block attribute list. - */ - if (dp->i_d.di_aformat == XFS_DINODE_FMT_LOCAL || - (dp->i_d.di_aformat == XFS_DINODE_FMT_EXTENTS && - dp->i_d.di_anextents == 0)) { - - /* - * Build initial attribute list (if required). - */ - if (dp->i_d.di_aformat == XFS_DINODE_FMT_EXTENTS) - xfs_attr_shortform_create(&args); - - /* - * Try to add the attr to the attribute list in - * the inode. - */ - error = xfs_attr_try_sf_addname(dp, &args); - if (error != -ENOSPC) { - xfs_iunlock(dp, XFS_ILOCK_EXCL); - return error; - } - - /* - * It won't fit in the shortform, transform to a leaf block. - * GROT: another possible req'mt for a double-split btree op. - */ - error = xfs_attr_shortform_to_leaf(&args, &leaf_bp); - if (error) - goto out; - /* - * Prevent the leaf buffer from being unlocked so that a - * concurrent AIL push cannot grab the half-baked leaf - * buffer and run into problems with the write verifier. - */ - xfs_trans_bhold(args.trans, leaf_bp); - error = xfs_defer_finish(&args.trans); - if (error) - goto out; - - /* - * Commit the leaf transformation. We'll need another (linked) - * transaction to add the new attribute to the leaf, which - * means that we have to hold & join the leaf buffer here too. - */ - error = xfs_trans_roll_inode(&args.trans, dp); - if (error) - goto out; - xfs_trans_bjoin(args.trans, leaf_bp); - leaf_bp = NULL; - } - - if (xfs_bmap_one_block(dp, XFS_ATTR_FORK)) - error = xfs_attr_leaf_addname(&args); - else - error = xfs_attr_node_addname(&args); + error = xfs_attr_set_args(&args, &leaf_bp); if (error) - goto out; + goto out_release_leaf; + if (!args.trans) { + /* shortform attribute has already been committed */ + goto out_unlock; + }
/* * If this is a synchronous mount, make sure that the @@ -365,17 +380,17 @@ xfs_attr_set( */ xfs_trans_log_inode(args.trans, dp, XFS_ILOG_CORE); error = xfs_trans_commit(args.trans); +out_unlock: xfs_iunlock(dp, XFS_ILOCK_EXCL); - return error;
-out: +out_release_leaf: if (leaf_bp) xfs_trans_brelse(args.trans, leaf_bp); +out_trans_cancel: if (args.trans) xfs_trans_cancel(args.trans); - xfs_iunlock(dp, XFS_ILOCK_EXCL); - return error; + goto out_unlock; }
/* diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h index 033ff8c478e2..f608ac8f306f 100644 --- a/fs/xfs/libxfs/xfs_attr.h +++ b/fs/xfs/libxfs/xfs_attr.h @@ -140,6 +140,7 @@ int xfs_attr_get(struct xfs_inode *ip, const unsigned char *name, unsigned char *value, int *valuelenp, int flags); int xfs_attr_set(struct xfs_inode *dp, const unsigned char *name, unsigned char *value, int valuelen, int flags); +int xfs_attr_set_args(struct xfs_da_args *args, struct xfs_buf **leaf_bp); int xfs_attr_remove(struct xfs_inode *dp, const unsigned char *name, int flags); int xfs_attr_list(struct xfs_inode *dp, char *buffer, int bufsize, int flags, struct attrlist_cursor_kern *cursor); diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c index ab2465bc413a..06a7da8dbda5 100644 --- a/fs/xfs/libxfs/xfs_bmap.c +++ b/fs/xfs/libxfs/xfs_bmap.c @@ -1019,6 +1019,34 @@ xfs_bmap_add_attrfork_local( return -EFSCORRUPTED; }
+/* Set an inode attr fork off based on the format */ +int +xfs_bmap_set_attrforkoff( + struct xfs_inode *ip, + int size, + int *version) +{ + switch (ip->i_d.di_format) { + case XFS_DINODE_FMT_DEV: + ip->i_d.di_forkoff = roundup(sizeof(xfs_dev_t), 8) >> 3; + break; + case XFS_DINODE_FMT_LOCAL: + case XFS_DINODE_FMT_EXTENTS: + case XFS_DINODE_FMT_BTREE: + ip->i_d.di_forkoff = xfs_attr_shortform_bytesfit(ip, size); + if (!ip->i_d.di_forkoff) + ip->i_d.di_forkoff = xfs_default_attroffset(ip) >> 3; + else if ((ip->i_mount->m_flags & XFS_MOUNT_ATTR2) && version) + *version = 2; + break; + default: + ASSERT(0); + return -EINVAL; + } + + return 0; +} + /* * Convert inode from non-attributed to attributed. * Must not be in a transaction, ip must not be locked. @@ -1070,26 +1098,9 @@ xfs_bmap_add_attrfork(
xfs_trans_ijoin(tp, ip, 0); xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE); - - switch (ip->i_d.di_format) { - case XFS_DINODE_FMT_DEV: - ip->i_d.di_forkoff = roundup(sizeof(xfs_dev_t), 8) >> 3; - break; - case XFS_DINODE_FMT_LOCAL: - case XFS_DINODE_FMT_EXTENTS: - case XFS_DINODE_FMT_BTREE: - ip->i_d.di_forkoff = xfs_attr_shortform_bytesfit(ip, size); - if (!ip->i_d.di_forkoff) - ip->i_d.di_forkoff = xfs_default_attroffset(ip) >> 3; - else if (mp->m_flags & XFS_MOUNT_ATTR2) - version = 2; - break; - default: - ASSERT(0); - error = -EINVAL; + error = xfs_bmap_set_attrforkoff(ip, size, &version); + if (error) goto trans_cancel; - } - ASSERT(ip->i_afp == NULL); ip->i_afp = kmem_zone_zalloc(xfs_ifork_zone, KM_SLEEP); ip->i_afp->if_flags = XFS_IFEXTENTS; diff --git a/fs/xfs/libxfs/xfs_bmap.h b/fs/xfs/libxfs/xfs_bmap.h index b6e9b639e731..488dc8860fd7 100644 --- a/fs/xfs/libxfs/xfs_bmap.h +++ b/fs/xfs/libxfs/xfs_bmap.h @@ -183,6 +183,7 @@ void xfs_trim_extent(struct xfs_bmbt_irec *irec, xfs_fileoff_t bno, xfs_filblks_t len); void xfs_trim_extent_eof(struct xfs_bmbt_irec *, struct xfs_inode *); int xfs_bmap_add_attrfork(struct xfs_inode *ip, int size, int rsvd); +int xfs_bmap_set_attrforkoff(struct xfs_inode *ip, int size, int *version); void xfs_bmap_local_to_extents_empty(struct xfs_inode *ip, int whichfork); void __xfs_bmap_add_free(struct xfs_trans *tp, xfs_fsblock_t bno, xfs_filblks_t len, struct xfs_owner_info *oinfo,
From: Allison Henderson allison.henderson@oracle.com
commit 068f985a9e5ec70fde58d8f679994fdbbd093a36 upstream.
This patch adds xfs_attr_remove_args. These sub-routines remove the attributes specified in @args. We will use this later for setting parent pointers as a deferred attribute operation.
Signed-off-by: Allison Henderson allison.henderson@oracle.com Reviewed-by: Dave Chinner dchinner@redhat.com Signed-off-by: Dave Chinner david@fromorbit.com Signed-off-by: Luis Chamberlain mcgrof@kernel.org --- fs/xfs/libxfs/xfs_attr.c | 36 +++++++++++++++++++++++++----------- fs/xfs/libxfs/xfs_attr.h | 1 + 2 files changed, 26 insertions(+), 11 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c index 25431ddba1fa..844ed87b1900 100644 --- a/fs/xfs/libxfs/xfs_attr.c +++ b/fs/xfs/libxfs/xfs_attr.c @@ -289,6 +289,30 @@ xfs_attr_set_args( return error; }
+/* + * Remove the attribute specified in @args. + */ +int +xfs_attr_remove_args( + struct xfs_da_args *args) +{ + struct xfs_inode *dp = args->dp; + int error; + + if (!xfs_inode_hasattr(dp)) { + error = -ENOATTR; + } else if (dp->i_d.di_aformat == XFS_DINODE_FMT_LOCAL) { + ASSERT(dp->i_afp->if_flags & XFS_IFINLINE); + error = xfs_attr_shortform_remove(args); + } else if (xfs_bmap_one_block(dp, XFS_ATTR_FORK)) { + error = xfs_attr_leaf_removename(args); + } else { + error = xfs_attr_node_removename(args); + } + + return error; +} + int xfs_attr_set( struct xfs_inode *dp, @@ -445,17 +469,7 @@ xfs_attr_remove( */ xfs_trans_ijoin(args.trans, dp, 0);
- if (!xfs_inode_hasattr(dp)) { - error = -ENOATTR; - } else if (dp->i_d.di_aformat == XFS_DINODE_FMT_LOCAL) { - ASSERT(dp->i_afp->if_flags & XFS_IFINLINE); - error = xfs_attr_shortform_remove(&args); - } else if (xfs_bmap_one_block(dp, XFS_ATTR_FORK)) { - error = xfs_attr_leaf_removename(&args); - } else { - error = xfs_attr_node_removename(&args); - } - + error = xfs_attr_remove_args(&args); if (error) goto out;
diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h index f608ac8f306f..bdf52a333f3f 100644 --- a/fs/xfs/libxfs/xfs_attr.h +++ b/fs/xfs/libxfs/xfs_attr.h @@ -142,6 +142,7 @@ int xfs_attr_set(struct xfs_inode *dp, const unsigned char *name, unsigned char *value, int valuelen, int flags); int xfs_attr_set_args(struct xfs_da_args *args, struct xfs_buf **leaf_bp); int xfs_attr_remove(struct xfs_inode *dp, const unsigned char *name, int flags); +int xfs_attr_remove_args(struct xfs_da_args *args); int xfs_attr_list(struct xfs_inode *dp, char *buffer, int bufsize, int flags, struct attrlist_cursor_kern *cursor);
From: "Darrick J. Wong" darrick.wong@oracle.com
commit 710d707d2fa9cf4c2aa9def129e71e99513466ea upstream.
During testing of xfs/141 on a V4 filesystem, I observed some inconsistent behavior with regards to resources that are held (i.e. remain locked) across a defer roll. The transaction roll always gives the defer roll function a new transaction, even if committing the old transaction fails. However, the defer roll function only rejoins the held resources if the transaction commit succeedied. This means that callers of defer roll have to figure out whether the held resources are attached to the transaction being passed back.
Worse yet, if the defer roll was part of a defer finish call, we have a third possibility: the defer finish could pass back a dirty transaction with dirty held resources and an error code.
The only sane way to handle all of these scenarios is to require that the code that held the resource either cancel the transaction before unlocking and releasing the resources, or use functions that detach resources from a transaction properly (e.g. xfs_trans_brelse) if they need to drop the reference before committing or cancelling the transaction.
In order to make this so, change the defer roll code to join held resources to the new transaction unconditionally and fix all the bhold callers to release the held buffers correctly.
Signed-off-by: Darrick J. Wong darrick.wong@oracle.com Reviewed-by: Brian Foster bfoster@redhat.com [mcgrof: fixes kz#204223 ] Signed-off-by: Luis Chamberlain mcgrof@kernel.org --- fs/xfs/libxfs/xfs_attr.c | 35 ++++++++++++----------------------- fs/xfs/libxfs/xfs_attr.h | 2 +- fs/xfs/libxfs/xfs_defer.c | 14 +++++++++----- fs/xfs/xfs_dquot.c | 17 +++++++++-------- 4 files changed, 31 insertions(+), 37 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c index 844ed87b1900..6410d3e00ce0 100644 --- a/fs/xfs/libxfs/xfs_attr.c +++ b/fs/xfs/libxfs/xfs_attr.c @@ -224,10 +224,10 @@ xfs_attr_try_sf_addname( */ int xfs_attr_set_args( - struct xfs_da_args *args, - struct xfs_buf **leaf_bp) + struct xfs_da_args *args) { struct xfs_inode *dp = args->dp; + struct xfs_buf *leaf_bp = NULL; int error;
/* @@ -255,7 +255,7 @@ xfs_attr_set_args( * It won't fit in the shortform, transform to a leaf block. * GROT: another possible req'mt for a double-split btree op. */ - error = xfs_attr_shortform_to_leaf(args, leaf_bp); + error = xfs_attr_shortform_to_leaf(args, &leaf_bp); if (error) return error;
@@ -263,23 +263,16 @@ xfs_attr_set_args( * Prevent the leaf buffer from being unlocked so that a * concurrent AIL push cannot grab the half-baked leaf * buffer and run into problems with the write verifier. + * Once we're done rolling the transaction we can release + * the hold and add the attr to the leaf. */ - xfs_trans_bhold(args->trans, *leaf_bp); - + xfs_trans_bhold(args->trans, leaf_bp); error = xfs_defer_finish(&args->trans); - if (error) - return error; - - /* - * Commit the leaf transformation. We'll need another - * (linked) transaction to add the new attribute to the - * leaf. - */ - error = xfs_trans_roll_inode(&args->trans, dp); - if (error) + xfs_trans_bhold_release(args->trans, leaf_bp); + if (error) { + xfs_trans_brelse(args->trans, leaf_bp); return error; - xfs_trans_bjoin(args->trans, *leaf_bp); - *leaf_bp = NULL; + } }
if (xfs_bmap_one_block(dp, XFS_ATTR_FORK)) @@ -322,7 +315,6 @@ xfs_attr_set( int flags) { struct xfs_mount *mp = dp->i_mount; - struct xfs_buf *leaf_bp = NULL; struct xfs_da_args args; struct xfs_trans_res tres; int rsvd = (flags & ATTR_ROOT) != 0; @@ -381,9 +373,9 @@ xfs_attr_set( goto out_trans_cancel;
xfs_trans_ijoin(args.trans, dp, 0); - error = xfs_attr_set_args(&args, &leaf_bp); + error = xfs_attr_set_args(&args); if (error) - goto out_release_leaf; + goto out_trans_cancel; if (!args.trans) { /* shortform attribute has already been committed */ goto out_unlock; @@ -408,9 +400,6 @@ xfs_attr_set( xfs_iunlock(dp, XFS_ILOCK_EXCL); return error;
-out_release_leaf: - if (leaf_bp) - xfs_trans_brelse(args.trans, leaf_bp); out_trans_cancel: if (args.trans) xfs_trans_cancel(args.trans); diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h index bdf52a333f3f..cc04ee0aacfb 100644 --- a/fs/xfs/libxfs/xfs_attr.h +++ b/fs/xfs/libxfs/xfs_attr.h @@ -140,7 +140,7 @@ int xfs_attr_get(struct xfs_inode *ip, const unsigned char *name, unsigned char *value, int *valuelenp, int flags); int xfs_attr_set(struct xfs_inode *dp, const unsigned char *name, unsigned char *value, int valuelen, int flags); -int xfs_attr_set_args(struct xfs_da_args *args, struct xfs_buf **leaf_bp); +int xfs_attr_set_args(struct xfs_da_args *args); int xfs_attr_remove(struct xfs_inode *dp, const unsigned char *name, int flags); int xfs_attr_remove_args(struct xfs_da_args *args); int xfs_attr_list(struct xfs_inode *dp, char *buffer, int bufsize, diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c index e792b167150a..c52beee31836 100644 --- a/fs/xfs/libxfs/xfs_defer.c +++ b/fs/xfs/libxfs/xfs_defer.c @@ -266,13 +266,15 @@ xfs_defer_trans_roll(
trace_xfs_defer_trans_roll(tp, _RET_IP_);
- /* Roll the transaction. */ + /* + * Roll the transaction. Rolling always given a new transaction (even + * if committing the old one fails!) to hand back to the caller, so we + * join the held resources to the new transaction so that we always + * return with the held resources joined to @tpp, no matter what + * happened. + */ error = xfs_trans_roll(tpp); tp = *tpp; - if (error) { - trace_xfs_defer_trans_roll_error(tp, error); - return error; - }
/* Rejoin the joined inodes. */ for (i = 0; i < ipcount; i++) @@ -284,6 +286,8 @@ xfs_defer_trans_roll( xfs_trans_bhold(tp, bplist[i]); }
+ if (error) + trace_xfs_defer_trans_roll_error(tp, error); return error; }
diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c index 87e6dd5326d5..a1af984e4913 100644 --- a/fs/xfs/xfs_dquot.c +++ b/fs/xfs/xfs_dquot.c @@ -277,7 +277,8 @@ xfs_dquot_set_prealloc_limits(struct xfs_dquot *dqp)
/* * Ensure that the given in-core dquot has a buffer on disk backing it, and - * return the buffer. This is called when the bmapi finds a hole. + * return the buffer locked and held. This is called when the bmapi finds a + * hole. */ STATIC int xfs_dquot_disk_alloc( @@ -355,13 +356,14 @@ xfs_dquot_disk_alloc( * If everything succeeds, the caller of this function is returned a * buffer that is locked and held to the transaction. The caller * is responsible for unlocking any buffer passed back, either - * manually or by committing the transaction. + * manually or by committing the transaction. On error, the buffer is + * released and not passed back. */ xfs_trans_bhold(tp, bp); error = xfs_defer_finish(tpp); - tp = *tpp; if (error) { - xfs_buf_relse(bp); + xfs_trans_bhold_release(*tpp, bp); + xfs_trans_brelse(*tpp, bp); return error; } *bpp = bp; @@ -521,7 +523,6 @@ xfs_qm_dqread_alloc( struct xfs_buf **bpp) { struct xfs_trans *tp; - struct xfs_buf *bp; int error;
error = xfs_trans_alloc(mp, &M_RES(mp)->tr_qm_dqalloc, @@ -529,7 +530,7 @@ xfs_qm_dqread_alloc( if (error) goto err;
- error = xfs_dquot_disk_alloc(&tp, dqp, &bp); + error = xfs_dquot_disk_alloc(&tp, dqp, bpp); if (error) goto err_cancel;
@@ -539,10 +540,10 @@ xfs_qm_dqread_alloc( * Buffer was held to the transaction, so we have to unlock it * manually here because we're not passing it back. */ - xfs_buf_relse(bp); + xfs_buf_relse(*bpp); + *bpp = NULL; goto err; } - *bpp = bp; return 0;
err_cancel:
On Wed, Jul 24, 2019 at 06:34:45AM +0000, Luis Chamberlain wrote:
Sasha,
you merged my last set of XFS fixes. I asked for one patch to not be merged yet as one issue was not yet properly fixed. After some further review I have identified commits which do fix the kernel crash reported on kz#204223 [0] with generic/388, this patch set applies on top of the last one I sent you.
These commits do quite a bit of code refactoring, and the actual fix lies hidden in the last commit by Darrick. Due to the amount of changes trying to extract the fix is riskier than just carring the code refactoring. If we're OK with the code refactor for stable, its my recommendation we keep the changes to match more with upstream and benefit from other fixes. The code refactoring was merged on v4.20 and Darrick's fix is the only fix upstream since the code was merged.
If others disagree with this approach please speak up.
I've run a full set of fstests against the following sections 12 times and have found no regressions against the baseline:
xfs xfs_logdev xfs_nocrc_512 xfs_nocrc xfs_realtimedev xfs_reflink_1024 xfs_reflink_dev
Review from others is appreciated.
[0] https://bugzilla.kernel.org/show_bug.cgi?id=204223
Allison Henderson (4): xfs: Move fs/xfs/xfs_attr.h to fs/xfs/libxfs/xfs_attr.h xfs: Add helper function xfs_attr_try_sf_addname xfs: Add attibute set and helper functions xfs: Add attibute remove and helper functions
Brian Foster (1): xfs: don't trip over uninitialized buffer on extent read of corrupted inode
Darrick J. Wong (1): xfs: always rejoin held resources during defer roll
fs/xfs/libxfs/xfs_attr.c | 231 ++++++++++++++++++--------------- fs/xfs/{ => libxfs}/xfs_attr.h | 2 + fs/xfs/libxfs/xfs_bmap.c | 54 +++++--- fs/xfs/libxfs/xfs_bmap.h | 1 + fs/xfs/libxfs/xfs_defer.c | 14 +- fs/xfs/xfs_dquot.c | 17 +-- 6 files changed, 183 insertions(+), 136 deletions(-) rename fs/xfs/{ => libxfs}/xfs_attr.h (98%)
*poke*
BTW I'm off on vacation for a while after today.
Luis
On Fri, Aug 23, 2019 at 03:44:59PM +0000, Luis Chamberlain wrote:
On Wed, Jul 24, 2019 at 06:34:45AM +0000, Luis Chamberlain wrote:
Sasha,
you merged my last set of XFS fixes. I asked for one patch to not be merged yet as one issue was not yet properly fixed. After some further review I have identified commits which do fix the kernel crash reported on kz#204223 [0] with generic/388, this patch set applies on top of the last one I sent you.
These commits do quite a bit of code refactoring, and the actual fix lies hidden in the last commit by Darrick. Due to the amount of changes trying to extract the fix is riskier than just carring the code refactoring. If we're OK with the code refactor for stable, its my recommendation we keep the changes to match more with upstream and benefit from other fixes. The code refactoring was merged on v4.20 and Darrick's fix is the only fix upstream since the code was merged.
If others disagree with this approach please speak up.
I've run a full set of fstests against the following sections 12 times and have found no regressions against the baseline:
xfs xfs_logdev xfs_nocrc_512 xfs_nocrc xfs_realtimedev xfs_reflink_1024 xfs_reflink_dev
Review from others is appreciated.
[0] https://bugzilla.kernel.org/show_bug.cgi?id=204223
Allison Henderson (4): xfs: Move fs/xfs/xfs_attr.h to fs/xfs/libxfs/xfs_attr.h xfs: Add helper function xfs_attr_try_sf_addname xfs: Add attibute set and helper functions xfs: Add attibute remove and helper functions
Brian Foster (1): xfs: don't trip over uninitialized buffer on extent read of corrupted inode
Darrick J. Wong (1): xfs: always rejoin held resources during defer roll
fs/xfs/libxfs/xfs_attr.c | 231 ++++++++++++++++++--------------- fs/xfs/{ => libxfs}/xfs_attr.h | 2 + fs/xfs/libxfs/xfs_bmap.c | 54 +++++--- fs/xfs/libxfs/xfs_bmap.h | 1 + fs/xfs/libxfs/xfs_defer.c | 14 +- fs/xfs/xfs_dquot.c | 17 +-- 6 files changed, 183 insertions(+), 136 deletions(-) rename fs/xfs/{ => libxfs}/xfs_attr.h (98%)
*poke*
I was hoping for an ack from the xfs folks...
BTW I'm off on vacation for a while after today.
I saw the ticket; have fun! :)
-- Thanks, Sasha
linux-stable-mirror@lists.linaro.org