[PATCH 1/2] btrfs: don't start transaction for scrub if the fs is mounted read-only

List overview All Threads
Download

newer

older

[PATCH 3/5] bfq: Simplify...

[for-linus][PATCH 3/3] tracing:...

Qu Wenruo

16 Dec 2021 16 Dec '21

11:47 a.m.

[BUG] The following super simple script would crash btrfs at unmount time, if CONFIG_BTRFS_ASSERT() is set.

mkfs.btrfs -f $dev mount $dev $mnt xfs_io -f -c "pwrite 0 4k" $mnt/file umount $mnt mount -r ro $dev $mnt btrfs scrub start -Br $mnt umount $mnt

This will trigger the following ASSERT() introduced by commit 0a31daa4b602 ("btrfs: add assertion for empty list of transactions at late stage of umount").

That patch is deifnitely not the cause, it just makes enough noise for us developer.

[CAUSE] We will start transaction for the following call chain during scrub:

scrub_enumerate_chunks() |- btrfs_inc_block_group_ro() |- btrfs_join_transaction()

However for RO mount, there is no running transaction at all, thus btrfs_join_transaction() will start a new transaction.

Furthermore, since it's read-only mount, btrfs_sync_fs() will not call btrfs_commit_super() to commit the new but empty transaction.

And lead to the ASSERT() being triggered.

The bug should be there for a long time. Only the new ASSERT() makes it noisy enough to be noticed.

[FIX] For read-only scrub on read-only mount, there is no need to start a transaction nor to allocate new chunks in btrfs_inc_block_group_ro().

Just do extra read-only mount check in btrfs_inc_block_group_ro(), and if it's read-only, skip all chunk allocation and go inc_block_group_ro() directly.

Since we're here, also add extra debug message at unmount for btrfs_fs_info::trans_list. Sometimes just knowing that there is no dirty metadata bytes for a uncommitted transaction can tell us a lot of things.

Cc: stable@vger.kernel.org # 5.4+ Signed-off-by: Qu Wenruo wqu@suse.com --- fs/btrfs/block-group.c | 13 +++++++++++++ 1 file changed, 13 insertions(+)

diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index 1db24e6d6d90..702219361b12 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -2544,6 +2544,19 @@ int btrfs_inc_block_group_ro(struct btrfs_block_group *cache, int ret; bool dirty_bg_running;

+ /* + * This can only happen when we are doing read-only scrub on read-only + * mount. + * In that case we should not start a new transaction on read-only fs. + * Thus here we skip all chunk allocation. + */ + if (sb_rdonly(fs_info->sb)) { + mutex_lock(&fs_info->ro_block_group_mutex); + ret = inc_block_group_ro(cache, 0); + mutex_unlock(&fs_info->ro_block_group_mutex); + return ret; + } + do { trans = btrfs_join_transaction(root); if (IS_ERR(trans))

-- 2.34.1

Show replies by date

David Sterba

3 Jan 3 Jan

6:52 p.m.

On Thu, Dec 16, 2021 at 07:47:35PM +0800, Qu Wenruo wrote:

...

[BUG] The following super simple script would crash btrfs at unmount time, if CONFIG_BTRFS_ASSERT() is set.

mkfs.btrfs -f $dev mount $dev $mnt xfs_io -f -c "pwrite 0 4k" $mnt/file umount $mnt mount -r ro $dev $mnt btrfs scrub start -Br $mnt umount $mnt

This will trigger the following ASSERT() introduced by commit 0a31daa4b602 ("btrfs: add assertion for empty list of transactions at late stage of umount").

That patch is deifnitely not the cause, it just makes enough noise for us developer.

[CAUSE] We will start transaction for the following call chain during scrub:

scrub_enumerate_chunks() |- btrfs_inc_block_group_ro() |- btrfs_join_transaction()

However for RO mount, there is no running transaction at all, thus btrfs_join_transaction() will start a new transaction.

Furthermore, since it's read-only mount, btrfs_sync_fs() will not call btrfs_commit_super() to commit the new but empty transaction.

And lead to the ASSERT() being triggered.

The bug should be there for a long time. Only the new ASSERT() makes it noisy enough to be noticed.

[FIX] For read-only scrub on read-only mount, there is no need to start a transaction nor to allocate new chunks in btrfs_inc_block_group_ro().

Just do extra read-only mount check in btrfs_inc_block_group_ro(), and if it's read-only, skip all chunk allocation and go inc_block_group_ro() directly.

Since we're here, also add extra debug message at unmount for btrfs_fs_info::trans_list. Sometimes just knowing that there is no dirty metadata bytes for a uncommitted transaction can tell us a lot of things.

Cc: stable@vger.kernel.org # 5.4+ Signed-off-by: Qu Wenruo wqu@suse.com

fs/btrfs/block-group.c | 13 +++++++++++++ 1 file changed, 13 insertions(+)

diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index 1db24e6d6d90..702219361b12 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -2544,6 +2544,19 @@ int btrfs_inc_block_group_ro(struct btrfs_block_group *cache, int ret; bool dirty_bg_running;
/*
* This can only happen when we are doing read-only scrub on read-only
* mount.
* In that case we should not start a new transaction on read-only fs.
* Thus here we skip all chunk allocation.
*/
if (sb_rdonly(fs_info->sb)) {

Should this also verify or at least assert that do_chunk_alloc is not set? The scrub code is used for replace that can set the parameter to true.

...

mutex_lock(&fs_info->ro_block_group_mutex);

```
ret = inc_block_group_ro(cache, 0);
```

mutex_unlock(&fs_info->ro_block_group_mutex);

```
return ret;
```

So this is taking a shortcut and skips a few things done in the function that use the transaction. I'm not sure how safe this is, it depends on the read-only status of superblock, that can chage any time, so what are further calls to btrfs_inc_block_group_ro going to do regaring the transaction?

...

}

do { trans = btrfs_join_transaction(root); if (IS_ERR(trans))

-- 2.34.1

Qu Wenruo

11:52 p.m.

On 2022/1/4 02:52, David Sterba wrote:

...

On Thu, Dec 16, 2021 at 07:47:35PM +0800, Qu Wenruo wrote:

...
[BUG] The following super simple script would crash btrfs at unmount time, if CONFIG_BTRFS_ASSERT() is set.

mkfs.btrfs -f $dev mount $dev $mnt xfs_io -f -c "pwrite 0 4k" $mnt/file umount $mnt mount -r ro $dev $mnt btrfs scrub start -Br $mnt umount $mnt

This will trigger the following ASSERT() introduced by commit 0a31daa4b602 ("btrfs: add assertion for empty list of transactions at late stage of umount").

That patch is deifnitely not the cause, it just makes enough noise for us developer.

[CAUSE] We will start transaction for the following call chain during scrub:

scrub_enumerate_chunks() |- btrfs_inc_block_group_ro() |- btrfs_join_transaction()

However for RO mount, there is no running transaction at all, thus btrfs_join_transaction() will start a new transaction.

Furthermore, since it's read-only mount, btrfs_sync_fs() will not call btrfs_commit_super() to commit the new but empty transaction.

And lead to the ASSERT() being triggered.

The bug should be there for a long time. Only the new ASSERT() makes it noisy enough to be noticed.

[FIX] For read-only scrub on read-only mount, there is no need to start a transaction nor to allocate new chunks in btrfs_inc_block_group_ro().

Just do extra read-only mount check in btrfs_inc_block_group_ro(), and if it's read-only, skip all chunk allocation and go inc_block_group_ro() directly.

Since we're here, also add extra debug message at unmount for btrfs_fs_info::trans_list. Sometimes just knowing that there is no dirty metadata bytes for a uncommitted transaction can tell us a lot of things.

Cc: stable@vger.kernel.org # 5.4+ Signed-off-by: Qu Wenruo wqu@suse.com

fs/btrfs/block-group.c | 13 +++++++++++++ 1 file changed, 13 insertions(+)

diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index 1db24e6d6d90..702219361b12 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -2544,6 +2544,19 @@ int btrfs_inc_block_group_ro(struct btrfs_block_group *cache, int ret; bool dirty_bg_running;
/*
* This can only happen when we are doing read-only scrub on read-only
* mount.
* In that case we should not start a new transaction on read-only fs.
* Thus here we skip all chunk allocation.
*/
if (sb_rdonly(fs_info->sb)) {
Should this also verify or at least assert that do_chunk_alloc is not set? The scrub code is used for replace that can set the parameter to true.

Replace start needs RW mount, thus we don't need to bother replace in this case.

...

...
mutex_lock(&fs_info->ro_block_group_mutex);
ret = inc_block_group_ro(cache, 0);
mutex_unlock(&fs_info->ro_block_group_mutex);
return ret;
So this is taking a shortcut and skips a few things done in the function that use the transaction. I'm not sure how safe this is, it depends on the read-only status of superblock, that can chage any time, so what are further calls to btrfs_inc_block_group_ro going to do regaring the transaction?

By anytime you mean "remount". Thus if that's your concern, I can make remount to stop read-only scrub, just to be extra safe.

Another thing is, only scrub and balance uses this function, for balance it needs RW.

For scrub, if one scrub is already running, even it's RO and then the fs mounted RW, then the next scrub run will return -EINPROGRESS or similar error.

Thus I don't think we need to bother too much about this.

Thanks, Qu

...

...

}

do { trans = btrfs_join_transaction(root); if (IS_ERR(trans))

-- 2.34.1

David Sterba

4 Jan 4 Jan

6:40 p.m.

On Tue, Jan 04, 2022 at 07:52:39AM +0800, Qu Wenruo wrote:

...

On 2022/1/4 02:52, David Sterba wrote:

...
On Thu, Dec 16, 2021 at 07:47:35PM +0800, Qu Wenruo wrote:

...
[BUG] The following super simple script would crash btrfs at unmount time, if CONFIG_BTRFS_ASSERT() is set.

mkfs.btrfs -f $dev mount $dev $mnt xfs_io -f -c "pwrite 0 4k" $mnt/file umount $mnt mount -r ro $dev $mnt btrfs scrub start -Br $mnt umount $mnt

This will trigger the following ASSERT() introduced by commit 0a31daa4b602 ("btrfs: add assertion for empty list of transactions at late stage of umount").

That patch is deifnitely not the cause, it just makes enough noise for us developer.

[CAUSE] We will start transaction for the following call chain during scrub:

scrub_enumerate_chunks() |- btrfs_inc_block_group_ro() |- btrfs_join_transaction()

However for RO mount, there is no running transaction at all, thus btrfs_join_transaction() will start a new transaction.

Furthermore, since it's read-only mount, btrfs_sync_fs() will not call btrfs_commit_super() to commit the new but empty transaction.

And lead to the ASSERT() being triggered.

The bug should be there for a long time. Only the new ASSERT() makes it noisy enough to be noticed.

[FIX] For read-only scrub on read-only mount, there is no need to start a transaction nor to allocate new chunks in btrfs_inc_block_group_ro().

Just do extra read-only mount check in btrfs_inc_block_group_ro(), and if it's read-only, skip all chunk allocation and go inc_block_group_ro() directly.

Since we're here, also add extra debug message at unmount for btrfs_fs_info::trans_list. Sometimes just knowing that there is no dirty metadata bytes for a uncommitted transaction can tell us a lot of things.

Cc: stable@vger.kernel.org # 5.4+ Signed-off-by: Qu Wenruo wqu@suse.com

fs/btrfs/block-group.c | 13 +++++++++++++ 1 file changed, 13 insertions(+)

diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index 1db24e6d6d90..702219361b12 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -2544,6 +2544,19 @@ int btrfs_inc_block_group_ro(struct btrfs_block_group *cache, int ret; bool dirty_bg_running;
/*
* This can only happen when we are doing read-only scrub on read-only
* mount.
* In that case we should not start a new transaction on read-only fs.
* Thus here we skip all chunk allocation.
*/
if (sb_rdonly(fs_info->sb)) {
Should this also verify or at least assert that do_chunk_alloc is not set? The scrub code is used for replace that can set the parameter to true.
Replace start needs RW mount, thus we don't need to bother replace in this case.

What if replace starts on rw mount, and then it's flipped to read-only? I don't see how this is prevented (like by mnt_want_write). It should not cause any problems either, as it would not start the transaction.

...

...
...
mutex_lock(&fs_info->ro_block_group_mutex);
ret = inc_block_group_ro(cache, 0);
mutex_unlock(&fs_info->ro_block_group_mutex);
return ret;
So this is taking a shortcut and skips a few things done in the function that use the transaction. I'm not sure how safe this is, it depends on the read-only status of superblock, that can chage any time, so what are further calls to btrfs_inc_block_group_ro going to do regaring the transaction?
By anytime you mean "remount". Thus if that's your concern, I can make remount to stop read-only scrub, just to be extra safe.

If scrub is running in the read-only mode then it's fine, the corner cases I'm interested in are some mixture of read-write/read-only on the filesystem and scrub and when they get out of sync.

...

Another thing is, only scrub and balance uses this function, for balance it needs RW.

For scrub, if one scrub is already running, even it's RO and then the fs mounted RW, then the next scrub run will return -EINPROGRESS or similar error.

Thus I don't think we need to bother too much about this.

It's not about another scrub running, that won't work, but what if scrub is started, and then at some point the filesystem gets remounted read-only. Both can be done without any notification by any system tool or service. So ther's no problematic case, then ok, I'm probably not understanding it completely yet so I'm asking. If it works by accident or there's a corner case left I'd rather find it now.

Qu Wenruo

10:13 p.m.

On 2022/1/5 02:40, David Sterba wrote:

...

On Tue, Jan 04, 2022 at 07:52:39AM +0800, Qu Wenruo wrote:

...
On 2022/1/4 02:52, David Sterba wrote:

...
On Thu, Dec 16, 2021 at 07:47:35PM +0800, Qu Wenruo wrote:

...
[BUG] The following super simple script would crash btrfs at unmount time, if CONFIG_BTRFS_ASSERT() is set.

mkfs.btrfs -f $dev mount $dev $mnt xfs_io -f -c "pwrite 0 4k" $mnt/file umount $mnt mount -r ro $dev $mnt btrfs scrub start -Br $mnt umount $mnt

This will trigger the following ASSERT() introduced by commit 0a31daa4b602 ("btrfs: add assertion for empty list of transactions at late stage of umount").

That patch is deifnitely not the cause, it just makes enough noise for us developer.

[CAUSE] We will start transaction for the following call chain during scrub:
scrub_enumerate_chunks()
|- btrfs_inc_block_group_ro()
   |- btrfs_join_transaction()
However for RO mount, there is no running transaction at all, thus btrfs_join_transaction() will start a new transaction.

Furthermore, since it's read-only mount, btrfs_sync_fs() will not call btrfs_commit_super() to commit the new but empty transaction.

And lead to the ASSERT() being triggered.

The bug should be there for a long time. Only the new ASSERT() makes it noisy enough to be noticed.

[FIX] For read-only scrub on read-only mount, there is no need to start a transaction nor to allocate new chunks in btrfs_inc_block_group_ro().

Just do extra read-only mount check in btrfs_inc_block_group_ro(), and if it's read-only, skip all chunk allocation and go inc_block_group_ro() directly.

Since we're here, also add extra debug message at unmount for btrfs_fs_info::trans_list. Sometimes just knowing that there is no dirty metadata bytes for a uncommitted transaction can tell us a lot of things.

Cc: stable@vger.kernel.org # 5.4+ Signed-off-by: Qu Wenruo wqu@suse.com

fs/btrfs/block-group.c | 13 +++++++++++++ 1 file changed, 13 insertions(+)

diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index 1db24e6d6d90..702219361b12 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -2544,6 +2544,19 @@ int btrfs_inc_block_group_ro(struct btrfs_block_group *cache, int ret; bool dirty_bg_running;
/*
* This can only happen when we are doing read-only scrub on read-only
* mount.
* In that case we should not start a new transaction on read-only fs.
* Thus here we skip all chunk allocation.
*/
if (sb_rdonly(fs_info->sb)) {
Should this also verify or at least assert that do_chunk_alloc is not set? The scrub code is used for replace that can set the parameter to true.
Replace start needs RW mount, thus we don't need to bother replace in this case.
What if replace starts on rw mount, and then it's flipped to read-only? I don't see how this is prevented (like by mnt_want_write). It should not cause any problems either, as it would not start the transaction.

For this case, there are 2 entrances:

- Remount RO We will stop replace in that case

- Some fs error (like trans abort) I believe we should fail at any transaction start.

Thanks, Qu

...

...
...
...
mutex_lock(&fs_info->ro_block_group_mutex);
ret = inc_block_group_ro(cache, 0);
mutex_unlock(&fs_info->ro_block_group_mutex);
return ret;
So this is taking a shortcut and skips a few things done in the function that use the transaction. I'm not sure how safe this is, it depends on the read-only status of superblock, that can chage any time, so what are further calls to btrfs_inc_block_group_ro going to do regaring the transaction?
By anytime you mean "remount". Thus if that's your concern, I can make remount to stop read-only scrub, just to be extra safe.
If scrub is running in the read-only mode then it's fine, the corner cases I'm interested in are some mixture of read-write/read-only on the filesystem and scrub and when they get out of sync.

...
Another thing is, only scrub and balance uses this function, for balance it needs RW.

For scrub, if one scrub is already running, even it's RO and then the fs mounted RW, then the next scrub run will return -EINPROGRESS or similar error.

Thus I don't think we need to bother too much about this.

It's not about another scrub running, that won't work, but what if scrub is started, and then at some point the filesystem gets remounted read-only. Both can be done without any notification by any system tool or service. So ther's no problematic case, then ok, I'm probably not understanding it completely yet so I'm asking. If it works by accident or there's a corner case left I'd rather find it now.

David Sterba

6 Jan 6 Jan

3:18 p.m.

On Wed, Jan 05, 2022 at 06:13:09AM +0800, Qu Wenruo wrote:

...

...
...
...
Should this also verify or at least assert that do_chunk_alloc is not set? The scrub code is used for replace that can set the parameter to true.

Replace start needs RW mount, thus we don't need to bother replace in this case.

What if replace starts on rw mount, and then it's flipped to read-only? I don't see how this is prevented (like by mnt_want_write). It should not cause any problems either, as it would not start the transaction.

For this case, there are 2 entrances:

Remount RO We will stop replace in that case

Some fs error (like trans abort) I believe we should fail at any transaction start.

Right, thanks, that was the missing piece.

1457

days inactive

1478

days old

linux-stable-mirror@lists.linaro.org

5 comments

participants

tags (0)

participants (3)

David Sterba
Qu Wenruo
Qu Wenruo