From: Allison Henderson allison.henderson@oracle.com
[ Upstream commit f103df763563ad6849307ed5985d1513acc586dd ]
With parent pointers enabled, a rename operation can update up to 5 inodes: src_dp, target_dp, src_ip, target_ip and wip. This causes their dquots to a be attached to the transaction chain, so we need to increase XFS_QM_TRANS_MAXDQS. This patch also add a helper function xfs_dqlockn to lock an arbitrary number of dquots.
Signed-off-by: Allison Henderson allison.henderson@oracle.com Reviewed-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Darrick J. Wong djwong@kernel.org Reviewed-by: Christoph Hellwig hch@lst.de
[amir: backport to kernels prior to parent pointers to fix an old bug]
A rename operation of a directory (i.e. mv A/C/ B/) may end up changing three different dquot accounts under the following conditions: 1. user (or group) quotas are enabled 2. A/ B/ and C/ have different owner uids (or gids) 3. A/ blocks shrinks after remove of entry C/ 4. B/ blocks grows before adding of entry C/ 5. A/ ino <= XFS_DIR2_MAX_SHORT_INUM 6. B/ ino > XFS_DIR2_MAX_SHORT_INUM 7. C/ is converted from sf to block format, because its parent entry needs to be stored as 8 bytes (see xfs_dir2_sf_replace_needblock)
When all conditions are met (observed in the wild) we get this assertion:
XFS: Assertion failed: qtrx, file: fs/xfs/xfs_trans_dquot.c, line: 207
The upstream commit fixed this bug as a side effect, so decided to apply it as is rather than changing XFS_QM_TRANS_MAXDQS to 3 in stable kernels.
The Fixes commit below is NOT the commit that introduced the bug, but for some reason, which is not explained in the commit message, it fixes the comment to state that highest number of dquots of one type is 3 and not 2 (which leads to the assertion), without actually fixing it.
The change of wording from "usr, grp OR prj" to "usr, grp and prj" suggests that there may have been a confusion between "the number of dquote of one type" and "the number of dquot types" (which is also 3), so the comment change was only accidentally correct.
Fixes: 10f73d27c8e9 ("xfs: fix the comment explaining xfs_trans_dqlockedjoin") Cc: stable@vger.kernel.org Signed-off-by: Amir Goldstein amir73il@gmail.com ---
Christoph,
This is a cognitive challenge. can you say what you where thinking in 2013 when making the comment change in the Fixes commit? Is my speculation above correct?
Catherine and Leah,
I decided that cherry-pick this upstream commit as is with a commit message addendum was the best stable tree strategy. The commit applies cleanly to 5.15.y, so I assume it does for 6.6 and 6.1 as well. I ran my tests on 5.15.y and nothing fell out, but did not try to reproduce these complex assertion in a test.
Could you take this candidate backport patch to a spin on your test branch?
What do you all think about this?
Thanks, Amir.
fs/xfs/xfs_dquot.c | 41 ++++++++++++++++++++++++++++++++++++++++ fs/xfs/xfs_dquot.h | 1 + fs/xfs/xfs_qm.h | 2 +- fs/xfs/xfs_trans_dquot.c | 15 ++++++++++----- 4 files changed, 53 insertions(+), 6 deletions(-)
diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c index c15d61d47a06..6b05d47aa19b 100644 --- a/fs/xfs/xfs_dquot.c +++ b/fs/xfs/xfs_dquot.c @@ -1360,6 +1360,47 @@ xfs_dqlock2( } }
+static int +xfs_dqtrx_cmp( + const void *a, + const void *b) +{ + const struct xfs_dqtrx *qa = a; + const struct xfs_dqtrx *qb = b; + + if (qa->qt_dquot->q_id > qb->qt_dquot->q_id) + return 1; + if (qa->qt_dquot->q_id < qb->qt_dquot->q_id) + return -1; + return 0; +} + +void +xfs_dqlockn( + struct xfs_dqtrx *q) +{ + unsigned int i; + + BUILD_BUG_ON(XFS_QM_TRANS_MAXDQS > MAX_LOCKDEP_SUBCLASSES); + + /* Sort in order of dquot id, do not allow duplicates */ + for (i = 0; i < XFS_QM_TRANS_MAXDQS && q[i].qt_dquot != NULL; i++) { + unsigned int j; + + for (j = 0; j < i; j++) + ASSERT(q[i].qt_dquot != q[j].qt_dquot); + } + if (i == 0) + return; + + sort(q, i, sizeof(struct xfs_dqtrx), xfs_dqtrx_cmp, NULL); + + mutex_lock(&q[0].qt_dquot->q_qlock); + for (i = 1; i < XFS_QM_TRANS_MAXDQS && q[i].qt_dquot != NULL; i++) + mutex_lock_nested(&q[i].qt_dquot->q_qlock, + XFS_QLOCK_NESTED + i - 1); +} + int __init xfs_qm_init(void) { diff --git a/fs/xfs/xfs_dquot.h b/fs/xfs/xfs_dquot.h index 6b5e3cf40c8b..0e954f88811f 100644 --- a/fs/xfs/xfs_dquot.h +++ b/fs/xfs/xfs_dquot.h @@ -231,6 +231,7 @@ int xfs_qm_dqget_uncached(struct xfs_mount *mp, void xfs_qm_dqput(struct xfs_dquot *dqp);
void xfs_dqlock2(struct xfs_dquot *, struct xfs_dquot *); +void xfs_dqlockn(struct xfs_dqtrx *q);
void xfs_dquot_set_prealloc_limits(struct xfs_dquot *);
diff --git a/fs/xfs/xfs_qm.h b/fs/xfs/xfs_qm.h index 442a0f97a9d4..f75c12c4c6a0 100644 --- a/fs/xfs/xfs_qm.h +++ b/fs/xfs/xfs_qm.h @@ -121,7 +121,7 @@ enum { XFS_QM_TRANS_PRJ, XFS_QM_TRANS_DQTYPES }; -#define XFS_QM_TRANS_MAXDQS 2 +#define XFS_QM_TRANS_MAXDQS 5 struct xfs_dquot_acct { struct xfs_dqtrx dqs[XFS_QM_TRANS_DQTYPES][XFS_QM_TRANS_MAXDQS]; }; diff --git a/fs/xfs/xfs_trans_dquot.c b/fs/xfs/xfs_trans_dquot.c index 955c457e585a..99a03acd4488 100644 --- a/fs/xfs/xfs_trans_dquot.c +++ b/fs/xfs/xfs_trans_dquot.c @@ -268,24 +268,29 @@ xfs_trans_mod_dquot(
/* * Given an array of dqtrx structures, lock all the dquots associated and join - * them to the transaction, provided they have been modified. We know that the - * highest number of dquots of one type - usr, grp and prj - involved in a - * transaction is 3 so we don't need to make this very generic. + * them to the transaction, provided they have been modified. */ STATIC void xfs_trans_dqlockedjoin( struct xfs_trans *tp, struct xfs_dqtrx *q) { + unsigned int i; ASSERT(q[0].qt_dquot != NULL); if (q[1].qt_dquot == NULL) { xfs_dqlock(q[0].qt_dquot); xfs_trans_dqjoin(tp, q[0].qt_dquot); - } else { - ASSERT(XFS_QM_TRANS_MAXDQS == 2); + } else if (q[2].qt_dquot == NULL) { xfs_dqlock2(q[0].qt_dquot, q[1].qt_dquot); xfs_trans_dqjoin(tp, q[0].qt_dquot); xfs_trans_dqjoin(tp, q[1].qt_dquot); + } else { + xfs_dqlockn(q); + for (i = 0; i < XFS_QM_TRANS_MAXDQS; i++) { + if (q[i].qt_dquot == NULL) + break; + xfs_trans_dqjoin(tp, q[i].qt_dquot); + } } }
On Sat, Sep 13, 2025 at 05:05:02AM +0200, Amir Goldstein wrote:
From: Allison Henderson allison.henderson@oracle.com
[ Upstream commit f103df763563ad6849307ed5985d1513acc586dd ]
With parent pointers enabled, a rename operation can update up to 5 inodes: src_dp, target_dp, src_ip, target_ip and wip. This causes their dquots to a be attached to the transaction chain, so we need to increase XFS_QM_TRANS_MAXDQS. This patch also add a helper function xfs_dqlockn to lock an arbitrary number of dquots.
Signed-off-by: Allison Henderson allison.henderson@oracle.com Reviewed-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Darrick J. Wong djwong@kernel.org Reviewed-by: Christoph Hellwig hch@lst.de
[amir: backport to kernels prior to parent pointers to fix an old bug]
A rename operation of a directory (i.e. mv A/C/ B/) may end up changing three different dquot accounts under the following conditions:
- user (or group) quotas are enabled
- A/ B/ and C/ have different owner uids (or gids)
- A/ blocks shrinks after remove of entry C/
- B/ blocks grows before adding of entry C/
- A/ ino <= XFS_DIR2_MAX_SHORT_INUM
- B/ ino > XFS_DIR2_MAX_SHORT_INUM
- C/ is converted from sf to block format, because its parent entry needs to be stored as 8 bytes (see xfs_dir2_sf_replace_needblock)
When all conditions are met (observed in the wild) we get this assertion:
XFS: Assertion failed: qtrx, file: fs/xfs/xfs_trans_dquot.c, line: 207
The upstream commit fixed this bug as a side effect, so decided to apply it as is rather than changing XFS_QM_TRANS_MAXDQS to 3 in stable kernels.
Heh. Indeed, you only need MAXDQS==5 for filesystems that support parent pointers, because only on those filesystems can you end up needing to allocate a xattr block either to the new whiteout file or free one from the file being unlinked.
The Fixes commit below is NOT the commit that introduced the bug, but for some reason, which is not explained in the commit message, it fixes the comment to state that highest number of dquots of one type is 3 and not 2 (which leads to the assertion), without actually fixing it.
Agree.
The change of wording from "usr, grp OR prj" to "usr, grp and prj" suggests that there may have been a confusion between "the number of dquote of one type" and "the number of dquot types" (which is also 3), so the comment change was only accidentally correct.
I interpret the "OR" -> "and" change to reflect the V4 -> V5 transition where you actually can have all three dquot types because group/project quota are no longer mutually exclusive.
The "...involved in a transaction is 3" part I think is separate, and strange that XFS_QM_TRANS_MAXDQS wasn't updated.
Fixes: 10f73d27c8e9 ("xfs: fix the comment explaining xfs_trans_dqlockedjoin") Cc: stable@vger.kernel.org Signed-off-by: Amir Goldstein amir73il@gmail.com
Christoph,
This is a cognitive challenge. can you say what you where thinking in 2013 when making the comment change in the Fixes commit? Is my speculation above correct?
Catherine and Leah,
I decided that cherry-pick this upstream commit as is with a commit message addendum was the best stable tree strategy. The commit applies cleanly to 5.15.y, so I assume it does for 6.6 and 6.1 as well. I ran my tests on 5.15.y and nothing fell out, but did not try to reproduce these complex assertion in a test.
Could you take this candidate backport patch to a spin on your test branch?
What do you all think about this?
I only think you need MAXDQS==5 for 6.12 to handle parent pointers.
The older kernels could have it set to 3 instead. struct xfs_dqtrx on a 6.17-rc6 kernel is 88 bytes. Stuffing 9 of them into struct xfs_dquot_acct instead of 15 means that the _acct struct is only 792 bytes instead of 1392, which means we can use the 1k slab instead of the 2k slab.
--D
Thanks, Amir.
fs/xfs/xfs_dquot.c | 41 ++++++++++++++++++++++++++++++++++++++++ fs/xfs/xfs_dquot.h | 1 + fs/xfs/xfs_qm.h | 2 +- fs/xfs/xfs_trans_dquot.c | 15 ++++++++++----- 4 files changed, 53 insertions(+), 6 deletions(-)
diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c index c15d61d47a06..6b05d47aa19b 100644 --- a/fs/xfs/xfs_dquot.c +++ b/fs/xfs/xfs_dquot.c @@ -1360,6 +1360,47 @@ xfs_dqlock2( } } +static int +xfs_dqtrx_cmp(
- const void *a,
- const void *b)
+{
- const struct xfs_dqtrx *qa = a;
- const struct xfs_dqtrx *qb = b;
- if (qa->qt_dquot->q_id > qb->qt_dquot->q_id)
return 1;
- if (qa->qt_dquot->q_id < qb->qt_dquot->q_id)
return -1;
- return 0;
+}
+void +xfs_dqlockn(
- struct xfs_dqtrx *q)
+{
- unsigned int i;
- BUILD_BUG_ON(XFS_QM_TRANS_MAXDQS > MAX_LOCKDEP_SUBCLASSES);
- /* Sort in order of dquot id, do not allow duplicates */
- for (i = 0; i < XFS_QM_TRANS_MAXDQS && q[i].qt_dquot != NULL; i++) {
unsigned int j;
for (j = 0; j < i; j++)
ASSERT(q[i].qt_dquot != q[j].qt_dquot);
- }
- if (i == 0)
return;
- sort(q, i, sizeof(struct xfs_dqtrx), xfs_dqtrx_cmp, NULL);
- mutex_lock(&q[0].qt_dquot->q_qlock);
- for (i = 1; i < XFS_QM_TRANS_MAXDQS && q[i].qt_dquot != NULL; i++)
mutex_lock_nested(&q[i].qt_dquot->q_qlock,
XFS_QLOCK_NESTED + i - 1);
+}
int __init xfs_qm_init(void) { diff --git a/fs/xfs/xfs_dquot.h b/fs/xfs/xfs_dquot.h index 6b5e3cf40c8b..0e954f88811f 100644 --- a/fs/xfs/xfs_dquot.h +++ b/fs/xfs/xfs_dquot.h @@ -231,6 +231,7 @@ int xfs_qm_dqget_uncached(struct xfs_mount *mp, void xfs_qm_dqput(struct xfs_dquot *dqp); void xfs_dqlock2(struct xfs_dquot *, struct xfs_dquot *); +void xfs_dqlockn(struct xfs_dqtrx *q); void xfs_dquot_set_prealloc_limits(struct xfs_dquot *); diff --git a/fs/xfs/xfs_qm.h b/fs/xfs/xfs_qm.h index 442a0f97a9d4..f75c12c4c6a0 100644 --- a/fs/xfs/xfs_qm.h +++ b/fs/xfs/xfs_qm.h @@ -121,7 +121,7 @@ enum { XFS_QM_TRANS_PRJ, XFS_QM_TRANS_DQTYPES }; -#define XFS_QM_TRANS_MAXDQS 2 +#define XFS_QM_TRANS_MAXDQS 5 struct xfs_dquot_acct { struct xfs_dqtrx dqs[XFS_QM_TRANS_DQTYPES][XFS_QM_TRANS_MAXDQS]; }; diff --git a/fs/xfs/xfs_trans_dquot.c b/fs/xfs/xfs_trans_dquot.c index 955c457e585a..99a03acd4488 100644 --- a/fs/xfs/xfs_trans_dquot.c +++ b/fs/xfs/xfs_trans_dquot.c @@ -268,24 +268,29 @@ xfs_trans_mod_dquot( /*
- Given an array of dqtrx structures, lock all the dquots associated and join
- them to the transaction, provided they have been modified. We know that the
- highest number of dquots of one type - usr, grp and prj - involved in a
- transaction is 3 so we don't need to make this very generic.
*/
- them to the transaction, provided they have been modified.
STATIC void xfs_trans_dqlockedjoin( struct xfs_trans *tp, struct xfs_dqtrx *q) {
- unsigned int i; ASSERT(q[0].qt_dquot != NULL); if (q[1].qt_dquot == NULL) { xfs_dqlock(q[0].qt_dquot); xfs_trans_dqjoin(tp, q[0].qt_dquot);
- } else {
ASSERT(XFS_QM_TRANS_MAXDQS == 2);
- } else if (q[2].qt_dquot == NULL) { xfs_dqlock2(q[0].qt_dquot, q[1].qt_dquot); xfs_trans_dqjoin(tp, q[0].qt_dquot); xfs_trans_dqjoin(tp, q[1].qt_dquot);
- } else {
xfs_dqlockn(q);
for (i = 0; i < XFS_QM_TRANS_MAXDQS; i++) {
if (q[i].qt_dquot == NULL)
break;
xfs_trans_dqjoin(tp, q[i].qt_dquot);
}}
} -- 2.47.1
On Mon, Sep 15, 2025 at 8:20 PM Darrick J. Wong djwong@kernel.org wrote:
On Sat, Sep 13, 2025 at 05:05:02AM +0200, Amir Goldstein wrote:
From: Allison Henderson allison.henderson@oracle.com
[ Upstream commit f103df763563ad6849307ed5985d1513acc586dd ]
With parent pointers enabled, a rename operation can update up to 5 inodes: src_dp, target_dp, src_ip, target_ip and wip. This causes their dquots to a be attached to the transaction chain, so we need to increase XFS_QM_TRANS_MAXDQS. This patch also add a helper function xfs_dqlockn to lock an arbitrary number of dquots.
Signed-off-by: Allison Henderson allison.henderson@oracle.com Reviewed-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Darrick J. Wong djwong@kernel.org Reviewed-by: Christoph Hellwig hch@lst.de
[amir: backport to kernels prior to parent pointers to fix an old bug]
A rename operation of a directory (i.e. mv A/C/ B/) may end up changing three different dquot accounts under the following conditions:
- user (or group) quotas are enabled
- A/ B/ and C/ have different owner uids (or gids)
- A/ blocks shrinks after remove of entry C/
- B/ blocks grows before adding of entry C/
- A/ ino <= XFS_DIR2_MAX_SHORT_INUM
- B/ ino > XFS_DIR2_MAX_SHORT_INUM
- C/ is converted from sf to block format, because its parent entry needs to be stored as 8 bytes (see xfs_dir2_sf_replace_needblock)
When all conditions are met (observed in the wild) we get this assertion:
XFS: Assertion failed: qtrx, file: fs/xfs/xfs_trans_dquot.c, line: 207
The upstream commit fixed this bug as a side effect, so decided to apply it as is rather than changing XFS_QM_TRANS_MAXDQS to 3 in stable kernels.
Heh. Indeed, you only need MAXDQS==5 for filesystems that support parent pointers, because only on those filesystems can you end up needing to allocate a xattr block either to the new whiteout file or free one from the file being unlinked.
The Fixes commit below is NOT the commit that introduced the bug, but for some reason, which is not explained in the commit message, it fixes the comment to state that highest number of dquots of one type is 3 and not 2 (which leads to the assertion), without actually fixing it.
Agree.
The change of wording from "usr, grp OR prj" to "usr, grp and prj" suggests that there may have been a confusion between "the number of dquote of one type" and "the number of dquot types" (which is also 3), so the comment change was only accidentally correct.
I interpret the "OR" -> "and" change to reflect the V4 -> V5 transition where you actually can have all three dquot types because group/project quota are no longer mutually exclusive.
The "...involved in a transaction is 3" part I think is separate, and strange that XFS_QM_TRANS_MAXDQS wasn't updated.
Fixes: 10f73d27c8e9 ("xfs: fix the comment explaining xfs_trans_dqlockedjoin") Cc: stable@vger.kernel.org Signed-off-by: Amir Goldstein amir73il@gmail.com
Christoph,
This is a cognitive challenge. can you say what you where thinking in 2013 when making the comment change in the Fixes commit? Is my speculation above correct?
Catherine and Leah,
I decided that cherry-pick this upstream commit as is with a commit message addendum was the best stable tree strategy. The commit applies cleanly to 5.15.y, so I assume it does for 6.6 and 6.1 as well. I ran my tests on 5.15.y and nothing fell out, but did not try to reproduce these complex assertion in a test.
Could you take this candidate backport patch to a spin on your test branch?
What do you all think about this?
I only think you need MAXDQS==5 for 6.12 to handle parent pointers.
Yes, of course. I just preferred to keep the 5 to avoid deviating from the upstream commit if there is no good reason to do so.
The older kernels could have it set to 3 instead. struct xfs_dqtrx on a 6.17-rc6 kernel is 88 bytes. Stuffing 9 of them into struct xfs_dquot_acct instead of 15 means that the _acct struct is only 792 bytes instead of 1392, which means we can use the 1k slab instead of the 2k slab.
Huh? there is only one xfs_dquot_acct per transaction. Does it really matter if it's 1k or 2k??
Am I missing something?
Thanks, Amir.
On Mon, Sep 15, 2025 at 10:20:40PM +0200, Amir Goldstein wrote:
On Mon, Sep 15, 2025 at 8:20 PM Darrick J. Wong djwong@kernel.org wrote:
On Sat, Sep 13, 2025 at 05:05:02AM +0200, Amir Goldstein wrote:
From: Allison Henderson allison.henderson@oracle.com
[ Upstream commit f103df763563ad6849307ed5985d1513acc586dd ]
With parent pointers enabled, a rename operation can update up to 5 inodes: src_dp, target_dp, src_ip, target_ip and wip. This causes their dquots to a be attached to the transaction chain, so we need to increase XFS_QM_TRANS_MAXDQS. This patch also add a helper function xfs_dqlockn to lock an arbitrary number of dquots.
Signed-off-by: Allison Henderson allison.henderson@oracle.com Reviewed-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Darrick J. Wong djwong@kernel.org Reviewed-by: Christoph Hellwig hch@lst.de
[amir: backport to kernels prior to parent pointers to fix an old bug]
A rename operation of a directory (i.e. mv A/C/ B/) may end up changing three different dquot accounts under the following conditions:
- user (or group) quotas are enabled
- A/ B/ and C/ have different owner uids (or gids)
- A/ blocks shrinks after remove of entry C/
- B/ blocks grows before adding of entry C/
- A/ ino <= XFS_DIR2_MAX_SHORT_INUM
- B/ ino > XFS_DIR2_MAX_SHORT_INUM
- C/ is converted from sf to block format, because its parent entry needs to be stored as 8 bytes (see xfs_dir2_sf_replace_needblock)
When all conditions are met (observed in the wild) we get this assertion:
XFS: Assertion failed: qtrx, file: fs/xfs/xfs_trans_dquot.c, line: 207
The upstream commit fixed this bug as a side effect, so decided to apply it as is rather than changing XFS_QM_TRANS_MAXDQS to 3 in stable kernels.
Heh. Indeed, you only need MAXDQS==5 for filesystems that support parent pointers, because only on those filesystems can you end up needing to allocate a xattr block either to the new whiteout file or free one from the file being unlinked.
The Fixes commit below is NOT the commit that introduced the bug, but for some reason, which is not explained in the commit message, it fixes the comment to state that highest number of dquots of one type is 3 and not 2 (which leads to the assertion), without actually fixing it.
Agree.
The change of wording from "usr, grp OR prj" to "usr, grp and prj" suggests that there may have been a confusion between "the number of dquote of one type" and "the number of dquot types" (which is also 3), so the comment change was only accidentally correct.
I interpret the "OR" -> "and" change to reflect the V4 -> V5 transition where you actually can have all three dquot types because group/project quota are no longer mutually exclusive.
The "...involved in a transaction is 3" part I think is separate, and strange that XFS_QM_TRANS_MAXDQS wasn't updated.
Fixes: 10f73d27c8e9 ("xfs: fix the comment explaining xfs_trans_dqlockedjoin") Cc: stable@vger.kernel.org Signed-off-by: Amir Goldstein amir73il@gmail.com
Christoph,
This is a cognitive challenge. can you say what you where thinking in 2013 when making the comment change in the Fixes commit? Is my speculation above correct?
Catherine and Leah,
I decided that cherry-pick this upstream commit as is with a commit message addendum was the best stable tree strategy. The commit applies cleanly to 5.15.y, so I assume it does for 6.6 and 6.1 as well. I ran my tests on 5.15.y and nothing fell out, but did not try to reproduce these complex assertion in a test.
Could you take this candidate backport patch to a spin on your test branch?
What do you all think about this?
I only think you need MAXDQS==5 for 6.12 to handle parent pointers.
Yes, of course. I just preferred to keep the 5 to avoid deviating from the upstream commit if there is no good reason to do so.
<shrug> What do Greg and Sasha think about this? If they don't mind this then I guess I don't either. ;)
The older kernels could have it set to 3 instead. struct xfs_dqtrx on a 6.17-rc6 kernel is 88 bytes. Stuffing 9 of them into struct xfs_dquot_acct instead of 15 means that the _acct struct is only 792 bytes instead of 1392, which means we can use the 1k slab instead of the 2k slab.
Huh? there is only one xfs_dquot_acct per transaction.
Yes, but there can be a lot of transactions in flight.
Does it really matter if it's 1k or 2k??
Am I missing something?
It seems silly to waste so much memory on a scenario that can't happen just so we can say that we hammered in a less appropriate solution.
--D
Thanks, Amir.
On Tue, Sep 16, 2025 at 1:40 AM Darrick J. Wong djwong@kernel.org wrote:
On Mon, Sep 15, 2025 at 10:20:40PM +0200, Amir Goldstein wrote:
On Mon, Sep 15, 2025 at 8:20 PM Darrick J. Wong djwong@kernel.org wrote:
On Sat, Sep 13, 2025 at 05:05:02AM +0200, Amir Goldstein wrote:
From: Allison Henderson allison.henderson@oracle.com
[ Upstream commit f103df763563ad6849307ed5985d1513acc586dd ]
With parent pointers enabled, a rename operation can update up to 5 inodes: src_dp, target_dp, src_ip, target_ip and wip. This causes their dquots to a be attached to the transaction chain, so we need to increase XFS_QM_TRANS_MAXDQS. This patch also add a helper function xfs_dqlockn to lock an arbitrary number of dquots.
Signed-off-by: Allison Henderson allison.henderson@oracle.com Reviewed-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Darrick J. Wong djwong@kernel.org Reviewed-by: Christoph Hellwig hch@lst.de
[amir: backport to kernels prior to parent pointers to fix an old bug]
A rename operation of a directory (i.e. mv A/C/ B/) may end up changing three different dquot accounts under the following conditions:
- user (or group) quotas are enabled
- A/ B/ and C/ have different owner uids (or gids)
- A/ blocks shrinks after remove of entry C/
- B/ blocks grows before adding of entry C/
- A/ ino <= XFS_DIR2_MAX_SHORT_INUM
- B/ ino > XFS_DIR2_MAX_SHORT_INUM
- C/ is converted from sf to block format, because its parent entry needs to be stored as 8 bytes (see xfs_dir2_sf_replace_needblock)
When all conditions are met (observed in the wild) we get this assertion:
XFS: Assertion failed: qtrx, file: fs/xfs/xfs_trans_dquot.c, line: 207
The upstream commit fixed this bug as a side effect, so decided to apply it as is rather than changing XFS_QM_TRANS_MAXDQS to 3 in stable kernels.
Heh. Indeed, you only need MAXDQS==5 for filesystems that support parent pointers, because only on those filesystems can you end up needing to allocate a xattr block either to the new whiteout file or free one from the file being unlinked.
The Fixes commit below is NOT the commit that introduced the bug, but for some reason, which is not explained in the commit message, it fixes the comment to state that highest number of dquots of one type is 3 and not 2 (which leads to the assertion), without actually fixing it.
Agree.
The change of wording from "usr, grp OR prj" to "usr, grp and prj" suggests that there may have been a confusion between "the number of dquote of one type" and "the number of dquot types" (which is also 3), so the comment change was only accidentally correct.
I interpret the "OR" -> "and" change to reflect the V4 -> V5 transition where you actually can have all three dquot types because group/project quota are no longer mutually exclusive.
The "...involved in a transaction is 3" part I think is separate, and strange that XFS_QM_TRANS_MAXDQS wasn't updated.
Fixes: 10f73d27c8e9 ("xfs: fix the comment explaining xfs_trans_dqlockedjoin") Cc: stable@vger.kernel.org Signed-off-by: Amir Goldstein amir73il@gmail.com
Christoph,
This is a cognitive challenge. can you say what you where thinking in 2013 when making the comment change in the Fixes commit? Is my speculation above correct?
Catherine and Leah,
I decided that cherry-pick this upstream commit as is with a commit message addendum was the best stable tree strategy. The commit applies cleanly to 5.15.y, so I assume it does for 6.6 and 6.1 as well. I ran my tests on 5.15.y and nothing fell out, but did not try to reproduce these complex assertion in a test.
Could you take this candidate backport patch to a spin on your test branch?
What do you all think about this?
I only think you need MAXDQS==5 for 6.12 to handle parent pointers.
Yes, of course. I just preferred to keep the 5 to avoid deviating from the upstream commit if there is no good reason to do so.
<shrug> What do Greg and Sasha think about this? If they don't mind this then I guess I don't either. ;)
Ok let's see.
Greg,
In kernels < 6.10 the size of the 'dqs' array per transaction was too small (XFS_QM_TRANS_MAXDQS is 2 instead of 3) which can, as explained in my commit, cause an assertion and crash the kernel.
This bug exists for a long time, it may have low probability for the entire "world", but under specific conditions (e.g. a specific workload that is fully controlled by unpriv user) it happens for us every other week on kernel 5.15.y and with more effort, an upriv user can trigger it much more frequently.
In kernel 6.10, XFS_QM_TRANS_MAXDQS was increased to 5 to cover a use case for a new feature (parent pointer). That means that upstream no longer has the bug.
I opted for applying this upstream commit to fix the stable kernel bug although raising the max to 5 is an overkill.
This has a slight impact on memory footprint, but I think it is negligible and in any case, same exact memory footprint as upstream code.
What do you prefer? Applying the commit as is with increase to 5 or apply a customized commit for kernels < 6.10 which raises the max to 3 without mentioning the upstream commit?
If you agree with my choice, please advise regarding my choice of formatting of the commit message - original commit message followed by stable specific bug fix commit message which explains the above.
The older kernels could have it set to 3 instead. struct xfs_dqtrx on a 6.17-rc6 kernel is 88 bytes. Stuffing 9 of them into struct xfs_dquot_acct instead of 15 means that the _acct struct is only 792 bytes instead of 1392, which means we can use the 1k slab instead of the 2k slab.
Huh? there is only one xfs_dquot_acct per transaction.
Yes, but there can be a lot of transactions in flight.
Does it really matter if it's 1k or 2k??
Am I missing something?
It seems silly to waste so much memory on a scenario that can't happen just so we can say that we hammered in a less appropriate solution.
Yeh, I do not like waste, but do not like to over complicate and micro optimize either.
Talking about waste, in current upstream xfs_dquot_acct is bloated as you described for the majority of users that enable quotas, who do not enable parent pointers.
So if we consider this memory waste to be a bug, I prefer that stable and upstream will be bug compatible.
If we fix that waste in upstream, we could add: Fixes: f103df763563a ("xfs: Increase XFS_QM_TRANS_MAXDQS to 5")
Which would then be flagged for picking to stable kernels that have this commit applied.
So? WDYT?
Thanks, Amir.
On Tue, Sep 16, 2025 at 08:37:54AM +0200, Amir Goldstein wrote:
On Tue, Sep 16, 2025 at 1:40 AM Darrick J. Wong djwong@kernel.org wrote:
On Mon, Sep 15, 2025 at 10:20:40PM +0200, Amir Goldstein wrote:
On Mon, Sep 15, 2025 at 8:20 PM Darrick J. Wong djwong@kernel.org wrote:
On Sat, Sep 13, 2025 at 05:05:02AM +0200, Amir Goldstein wrote:
From: Allison Henderson allison.henderson@oracle.com
[ Upstream commit f103df763563ad6849307ed5985d1513acc586dd ]
With parent pointers enabled, a rename operation can update up to 5 inodes: src_dp, target_dp, src_ip, target_ip and wip. This causes their dquots to a be attached to the transaction chain, so we need to increase XFS_QM_TRANS_MAXDQS. This patch also add a helper function xfs_dqlockn to lock an arbitrary number of dquots.
Signed-off-by: Allison Henderson allison.henderson@oracle.com Reviewed-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Darrick J. Wong djwong@kernel.org Reviewed-by: Christoph Hellwig hch@lst.de
[amir: backport to kernels prior to parent pointers to fix an old bug]
A rename operation of a directory (i.e. mv A/C/ B/) may end up changing three different dquot accounts under the following conditions:
- user (or group) quotas are enabled
- A/ B/ and C/ have different owner uids (or gids)
- A/ blocks shrinks after remove of entry C/
- B/ blocks grows before adding of entry C/
- A/ ino <= XFS_DIR2_MAX_SHORT_INUM
- B/ ino > XFS_DIR2_MAX_SHORT_INUM
- C/ is converted from sf to block format, because its parent entry needs to be stored as 8 bytes (see xfs_dir2_sf_replace_needblock)
When all conditions are met (observed in the wild) we get this assertion:
XFS: Assertion failed: qtrx, file: fs/xfs/xfs_trans_dquot.c, line: 207
The upstream commit fixed this bug as a side effect, so decided to apply it as is rather than changing XFS_QM_TRANS_MAXDQS to 3 in stable kernels.
Heh. Indeed, you only need MAXDQS==5 for filesystems that support parent pointers, because only on those filesystems can you end up needing to allocate a xattr block either to the new whiteout file or free one from the file being unlinked.
The Fixes commit below is NOT the commit that introduced the bug, but for some reason, which is not explained in the commit message, it fixes the comment to state that highest number of dquots of one type is 3 and not 2 (which leads to the assertion), without actually fixing it.
Agree.
The change of wording from "usr, grp OR prj" to "usr, grp and prj" suggests that there may have been a confusion between "the number of dquote of one type" and "the number of dquot types" (which is also 3), so the comment change was only accidentally correct.
I interpret the "OR" -> "and" change to reflect the V4 -> V5 transition where you actually can have all three dquot types because group/project quota are no longer mutually exclusive.
The "...involved in a transaction is 3" part I think is separate, and strange that XFS_QM_TRANS_MAXDQS wasn't updated.
Fixes: 10f73d27c8e9 ("xfs: fix the comment explaining xfs_trans_dqlockedjoin") Cc: stable@vger.kernel.org Signed-off-by: Amir Goldstein amir73il@gmail.com
Christoph,
This is a cognitive challenge. can you say what you where thinking in 2013 when making the comment change in the Fixes commit? Is my speculation above correct?
Catherine and Leah,
I decided that cherry-pick this upstream commit as is with a commit message addendum was the best stable tree strategy. The commit applies cleanly to 5.15.y, so I assume it does for 6.6 and 6.1 as well. I ran my tests on 5.15.y and nothing fell out, but did not try to reproduce these complex assertion in a test.
Could you take this candidate backport patch to a spin on your test branch?
What do you all think about this?
I only think you need MAXDQS==5 for 6.12 to handle parent pointers.
Yes, of course. I just preferred to keep the 5 to avoid deviating from the upstream commit if there is no good reason to do so.
<shrug> What do Greg and Sasha think about this? If they don't mind this then I guess I don't either. ;)
Ok let's see.
Greg,
In kernels < 6.10 the size of the 'dqs' array per transaction was too small (XFS_QM_TRANS_MAXDQS is 2 instead of 3) which can, as explained in my commit, cause an assertion and crash the kernel.
This bug exists for a long time, it may have low probability for the entire "world", but under specific conditions (e.g. a specific workload that is fully controlled by unpriv user) it happens for us every other week on kernel 5.15.y and with more effort, an upriv user can trigger it much more frequently.
In kernel 6.10, XFS_QM_TRANS_MAXDQS was increased to 5 to cover a use case for a new feature (parent pointer). That means that upstream no longer has the bug.
I opted for applying this upstream commit to fix the stable kernel bug although raising the max to 5 is an overkill.
This has a slight impact on memory footprint, but I think it is negligible and in any case, same exact memory footprint as upstream code.
What do you prefer? Applying the commit as is with increase to 5 or apply a customized commit for kernels < 6.10 which raises the max to 3 without mentioning the upstream commit?
If you agree with my choice, please advise regarding my choice of formatting of the commit message - original commit message followed by stable specific bug fix commit message which explains the above.
Original message followed by stable specific bugfix seems to make sense to me, thanks!
greg k-h
On Sat, Sep 13, 2025 at 5:05 AM Amir Goldstein amir73il@gmail.com wrote:
From: Allison Henderson allison.henderson@oracle.com
[ Upstream commit f103df763563ad6849307ed5985d1513acc586dd ]
With parent pointers enabled, a rename operation can update up to 5 inodes: src_dp, target_dp, src_ip, target_ip and wip. This causes their dquots to a be attached to the transaction chain, so we need to increase XFS_QM_TRANS_MAXDQS. This patch also add a helper function xfs_dqlockn to lock an arbitrary number of dquots.
Signed-off-by: Allison Henderson allison.henderson@oracle.com Reviewed-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Darrick J. Wong djwong@kernel.org Reviewed-by: Christoph Hellwig hch@lst.de
[amir: backport to kernels prior to parent pointers to fix an old bug]
A rename operation of a directory (i.e. mv A/C/ B/) may end up changing three different dquot accounts under the following conditions:
- user (or group) quotas are enabled
- A/ B/ and C/ have different owner uids (or gids)
- A/ blocks shrinks after remove of entry C/
- B/ blocks grows before adding of entry C/
- A/ ino <= XFS_DIR2_MAX_SHORT_INUM
- B/ ino > XFS_DIR2_MAX_SHORT_INUM
- C/ is converted from sf to block format, because its parent entry needs to be stored as 8 bytes (see xfs_dir2_sf_replace_needblock)
When all conditions are met (observed in the wild) we get this assertion:
XFS: Assertion failed: qtrx, file: fs/xfs/xfs_trans_dquot.c, line: 207
The upstream commit fixed this bug as a side effect, so decided to apply it as is rather than changing XFS_QM_TRANS_MAXDQS to 3 in stable kernels.
The Fixes commit below is NOT the commit that introduced the bug, but for some reason, which is not explained in the commit message, it fixes the comment to state that highest number of dquots of one type is 3 and not 2 (which leads to the assertion), without actually fixing it.
The change of wording from "usr, grp OR prj" to "usr, grp and prj" suggests that there may have been a confusion between "the number of dquote of one type" and "the number of dquot types" (which is also 3), so the comment change was only accidentally correct.
Fixes: 10f73d27c8e9 ("xfs: fix the comment explaining xfs_trans_dqlockedjoin") Cc: stable@vger.kernel.org Signed-off-by: Amir Goldstein amir73il@gmail.com
Catherine and Leah,
I decided that cherry-pick this upstream commit as is with a commit message addendum was the best stable tree strategy. The commit applies cleanly to 5.15.y, so I assume it does for 6.6 and 6.1 as well. I ran my tests on 5.15.y and nothing fell out, but did not try to reproduce these complex assertion in a test.
Could you take this candidate backport patch to a spin on your test branch?
Hi Catherine/Leah,
Do you plan to do a batch of backports to 6.6.y/6.1.y anytime soon. Would you mind adding this patch to your candidates for whenever you plan to test a batch?
Thanks! Amir.
On Sep 24, 2025, at 8:44 AM, Amir Goldstein amir73il@gmail.com wrote:
On Sat, Sep 13, 2025 at 5:05 AM Amir Goldstein amir73il@gmail.com wrote:
From: Allison Henderson allison.henderson@oracle.com
[ Upstream commit f103df763563ad6849307ed5985d1513acc586dd ]
With parent pointers enabled, a rename operation can update up to 5 inodes: src_dp, target_dp, src_ip, target_ip and wip. This causes their dquots to a be attached to the transaction chain, so we need to increase XFS_QM_TRANS_MAXDQS. This patch also add a helper function xfs_dqlockn to lock an arbitrary number of dquots.
Signed-off-by: Allison Henderson allison.henderson@oracle.com Reviewed-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Darrick J. Wong djwong@kernel.org Reviewed-by: Christoph Hellwig hch@lst.de
[amir: backport to kernels prior to parent pointers to fix an old bug]
A rename operation of a directory (i.e. mv A/C/ B/) may end up changing three different dquot accounts under the following conditions:
- user (or group) quotas are enabled
- A/ B/ and C/ have different owner uids (or gids)
- A/ blocks shrinks after remove of entry C/
- B/ blocks grows before adding of entry C/
- A/ ino <= XFS_DIR2_MAX_SHORT_INUM
- B/ ino > XFS_DIR2_MAX_SHORT_INUM
- C/ is converted from sf to block format, because its parent entry
needs to be stored as 8 bytes (see xfs_dir2_sf_replace_needblock)
When all conditions are met (observed in the wild) we get this assertion:
XFS: Assertion failed: qtrx, file: fs/xfs/xfs_trans_dquot.c, line: 207
The upstream commit fixed this bug as a side effect, so decided to apply it as is rather than changing XFS_QM_TRANS_MAXDQS to 3 in stable kernels.
The Fixes commit below is NOT the commit that introduced the bug, but for some reason, which is not explained in the commit message, it fixes the comment to state that highest number of dquots of one type is 3 and not 2 (which leads to the assertion), without actually fixing it.
The change of wording from "usr, grp OR prj" to "usr, grp and prj" suggests that there may have been a confusion between "the number of dquote of one type" and "the number of dquot types" (which is also 3), so the comment change was only accidentally correct.
Fixes: 10f73d27c8e9 ("xfs: fix the comment explaining xfs_trans_dqlockedjoin") Cc: stable@vger.kernel.org Signed-off-by: Amir Goldstein amir73il@gmail.com
Catherine and Leah,
I decided that cherry-pick this upstream commit as is with a commit message addendum was the best stable tree strategy. The commit applies cleanly to 5.15.y, so I assume it does for 6.6 and 6.1 as well. I ran my tests on 5.15.y and nothing fell out, but did not try to reproduce these complex assertion in a test.
Could you take this candidate backport patch to a spin on your test branch?
Hi Catherine/Leah,
Do you plan to do a batch of backports to 6.6.y/6.1.y anytime soon. Would you mind adding this patch to your candidates for whenever you plan to test a batch?
Thanks! Amir.
Hi Amir,
This patch looks ok to me. I’ll add it to my branch and let you know if I come across any issues. Thanks!
Catherine
linux-stable-mirror@lists.linaro.org