On Sat 16-07-22 11:00:46, Zhihao Cheng wrote:
在 2022/7/12 18:54, Jan Kara 写道: Hi Jan, one little question confuses me:
When ext4_xattr_block_set() decides to remove xattr block the following race can happen:
CPU1 CPU2 ext4_xattr_block_set() ext4_xattr_release_block() new_bh = ext4_xattr_block_cache_find()
lock_buffer(bh); ref = le32_to_cpu(BHDR(bh)->h_refcount); if (ref == 1) { ... mb_cache_entry_delete(); unlock_buffer(bh); ext4_free_blocks(); ... ext4_forget(..., bh, ...); jbd2_journal_revoke(..., bh);
ext4_journal_get_write_access(..., new_bh, ...) do_get_write_access() jbd2_journal_cancel_revoke(..., new_bh);
Later the code in ext4_xattr_block_set() finds out the block got freed and cancels reusal of the block but the revoke stays canceled and so in case of block reuse and journal replay the filesystem can get corrupted. If the race works out slightly differently, we can also hit assertions in the jbd2 code.
Fix the problem by making sure that once matching mbcache entry is found, code dropping the last xattr block reference (or trying to modify xattr block in place) waits until the mbcache entry reference is dropped. This way code trying to reuse xattr block is protected from someone trying to drop the last reference to xattr block.
Reported-and-tested-by: Ritesh Harjani ritesh.list@gmail.com CC: stable@vger.kernel.org Fixes: 82939d7999df ("ext4: convert to mbcache2") Signed-off-by: Jan Kara jack@suse.cz
...
@@ -1991,18 +2020,13 @@ ext4_xattr_block_set(handle_t *handle, struct inode *inode, lock_buffer(new_bh); /* * We have to be careful about races with
* freeing, rehashing or adding references to
* xattr block. Once we hold buffer lock xattr
* block's state is stable so we can check
* whether the block got freed / rehashed or
* not. Since we unhash mbcache entry under
* buffer lock when freeing / rehashing xattr
* block, checking whether entry is still
* hashed is reliable. Same rules hold for
* e_reusable handling.
* adding references to xattr block. Once we
* hold buffer lock xattr block's state is
* stable so we can check the additional
* reference fits. */
if (hlist_bl_unhashed(&ce->e_hash_list) ||
!ce->e_reusable) {
ref = le32_to_cpu(BHDR(new_bh)->h_refcount) + 1;
if (ref > EXT4_XATTR_REFCOUNT_MAX) {
So far, we have mb_cache_entry_delete_or_get() and mb_cache_entry_wait_unused(), so used cache entry cannot be concurrently removed. Removing check 'hlist_bl_unhashed(&ce->e_hash_list)' is okay.
What's affect of changing the other two checks 'ref >= EXT4_XATTR_REFCOUNT_MAX' and '!ce->e_reusable'? To make code more obvious? eg. To point out the condition of 'ref <= EXT4_XATTR_REFCOUNT_MAX' rather than 'ce->e_reusable', we have checked 'ce->e_reusable' in ext4_xattr_block_cache_find() before?
Well, ce->e_reusable is set if and only if BHDR(new_bh)->h_refcount < EXT4_XATTR_REFCOUNT_MAX. So checking whether the refcount is small enough is all that is needed and we don't need the ce->e_reusable check here.
Honza