Linux-stable-mirror June 2019

linux-stable-mirror@lists.linaro.org

338 participants
1243 discussions

FAILED: patch "[PATCH] btrfs: Ensure replaced device doesn't have pending chunk" failed to apply to 4.19-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.19-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From debd1c065d2037919a7da67baf55cc683fee09f0 Mon Sep 17 00:00:00 2001 From: Nikolay Borisov <nborisov(a)suse.com> Date: Fri, 17 May 2019 10:44:25 +0300 Subject: [PATCH] btrfs: Ensure replaced device doesn't have pending chunk allocation Recent FITRIM work, namely bbbf7243d62d ("btrfs: combine device update operations during transaction commit") combined the way certain operations are recoded in a transaction. As a result an ASSERT was added in dev_replace_finish to ensure the new code works correctly. Unfortunately I got reports that it's possible to trigger the assert, meaning that during a device replace it's possible to have an unfinished chunk allocation on the source device. This is supposed to be prevented by the fact that a transaction is committed before finishing the replace oepration and alter acquiring the chunk mutex. This is not sufficient since by the time the transaction is committed and the chunk mutex acquired it's possible to allocate a chunk depending on the workload being executed on the replaced device. This bug has been present ever since device replace was introduced but there was never code which checks for it. The correct way to fix is to ensure that there is no pending device modification operation when the chunk mutex is acquire and if there is repeat transaction commit. Unfortunately it's not possible to just exclude the source device from btrfs_fs_devices::dev_alloc_list since this causes ENOSPC to be hit in transaction commit. Fixing that in another way would need to add special cases to handle the last writes and forbid new ones. The looped transaction fix is more obvious, and can be easily backported. The runtime of dev-replace is long so there's no noticeable delay caused by that. Reported-by: David Sterba <dsterba(a)suse.com> Fixes: 391cd9df81ac ("Btrfs: fix unprotected alloc list insertion during the finishing procedure of replace") CC: stable(a)vger.kernel.org # 4.4+ Signed-off-by: Nikolay Borisov <nborisov(a)suse.com> Reviewed-by: David Sterba <dsterba(a)suse.com> Signed-off-by: David Sterba <dsterba(a)suse.com> diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c index 55c15f31d00d..ee0989c7e3a9 100644 --- a/fs/btrfs/dev-replace.c +++ b/fs/btrfs/dev-replace.c @@ -603,17 +603,33 @@ static int btrfs_dev_replace_finishing(struct btrfs_fs_info *fs_info, } btrfs_wait_ordered_roots(fs_info, U64_MAX, 0, (u64)-1); - trans = btrfs_start_transaction(root, 0); - if (IS_ERR(trans)) { - mutex_unlock(&dev_replace->lock_finishing_cancel_unmount); - return PTR_ERR(trans); + /* + * We have to use this loop approach because at this point src_device + * has to be available for transaction commit to complete, yet new + * chunks shouldn't be allocated on the device. + */ + while (1) { + trans = btrfs_start_transaction(root, 0); + if (IS_ERR(trans)) { + mutex_unlock(&dev_replace->lock_finishing_cancel_unmount); + return PTR_ERR(trans); + } + ret = btrfs_commit_transaction(trans); + WARN_ON(ret); + + /* Prevent write_all_supers() during the finishing procedure */ + mutex_lock(&fs_info->fs_devices->device_list_mutex); + /* Prevent new chunks being allocated on the source device */ + mutex_lock(&fs_info->chunk_mutex); + + if (!list_empty(&src_device->post_commit_list)) { + mutex_unlock(&fs_info->fs_devices->device_list_mutex); + mutex_unlock(&fs_info->chunk_mutex); + } else { + break; + } } - ret = btrfs_commit_transaction(trans); - WARN_ON(ret); - /* keep away write_all_supers() during the finishing procedure */ - mutex_lock(&fs_info->fs_devices->device_list_mutex); - mutex_lock(&fs_info->chunk_mutex); down_write(&dev_replace->rwsem); dev_replace->replace_state = scrub_ret ? BTRFS_IOCTL_DEV_REPLACE_STATE_CANCELED @@ -662,7 +678,6 @@ static int btrfs_dev_replace_finishing(struct btrfs_fs_info *fs_info, btrfs_device_set_disk_total_bytes(tgt_device, src_device->disk_total_bytes); btrfs_device_set_bytes_used(tgt_device, src_device->bytes_used); - ASSERT(list_empty(&src_device->post_commit_list)); tgt_device->commit_total_bytes = src_device->commit_total_bytes; tgt_device->commit_bytes_used = src_device->bytes_used;

6 years, 7 months

FAILED: patch "[PATCH] btrfs: Ensure replaced device doesn't have pending chunk" failed to apply to 5.1-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 5.1-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From debd1c065d2037919a7da67baf55cc683fee09f0 Mon Sep 17 00:00:00 2001 From: Nikolay Borisov <nborisov(a)suse.com> Date: Fri, 17 May 2019 10:44:25 +0300 Subject: [PATCH] btrfs: Ensure replaced device doesn't have pending chunk allocation Recent FITRIM work, namely bbbf7243d62d ("btrfs: combine device update operations during transaction commit") combined the way certain operations are recoded in a transaction. As a result an ASSERT was added in dev_replace_finish to ensure the new code works correctly. Unfortunately I got reports that it's possible to trigger the assert, meaning that during a device replace it's possible to have an unfinished chunk allocation on the source device. This is supposed to be prevented by the fact that a transaction is committed before finishing the replace oepration and alter acquiring the chunk mutex. This is not sufficient since by the time the transaction is committed and the chunk mutex acquired it's possible to allocate a chunk depending on the workload being executed on the replaced device. This bug has been present ever since device replace was introduced but there was never code which checks for it. The correct way to fix is to ensure that there is no pending device modification operation when the chunk mutex is acquire and if there is repeat transaction commit. Unfortunately it's not possible to just exclude the source device from btrfs_fs_devices::dev_alloc_list since this causes ENOSPC to be hit in transaction commit. Fixing that in another way would need to add special cases to handle the last writes and forbid new ones. The looped transaction fix is more obvious, and can be easily backported. The runtime of dev-replace is long so there's no noticeable delay caused by that. Reported-by: David Sterba <dsterba(a)suse.com> Fixes: 391cd9df81ac ("Btrfs: fix unprotected alloc list insertion during the finishing procedure of replace") CC: stable(a)vger.kernel.org # 4.4+ Signed-off-by: Nikolay Borisov <nborisov(a)suse.com> Reviewed-by: David Sterba <dsterba(a)suse.com> Signed-off-by: David Sterba <dsterba(a)suse.com> diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c index 55c15f31d00d..ee0989c7e3a9 100644 --- a/fs/btrfs/dev-replace.c +++ b/fs/btrfs/dev-replace.c @@ -603,17 +603,33 @@ static int btrfs_dev_replace_finishing(struct btrfs_fs_info *fs_info, } btrfs_wait_ordered_roots(fs_info, U64_MAX, 0, (u64)-1); - trans = btrfs_start_transaction(root, 0); - if (IS_ERR(trans)) { - mutex_unlock(&dev_replace->lock_finishing_cancel_unmount); - return PTR_ERR(trans); + /* + * We have to use this loop approach because at this point src_device + * has to be available for transaction commit to complete, yet new + * chunks shouldn't be allocated on the device. + */ + while (1) { + trans = btrfs_start_transaction(root, 0); + if (IS_ERR(trans)) { + mutex_unlock(&dev_replace->lock_finishing_cancel_unmount); + return PTR_ERR(trans); + } + ret = btrfs_commit_transaction(trans); + WARN_ON(ret); + + /* Prevent write_all_supers() during the finishing procedure */ + mutex_lock(&fs_info->fs_devices->device_list_mutex); + /* Prevent new chunks being allocated on the source device */ + mutex_lock(&fs_info->chunk_mutex); + + if (!list_empty(&src_device->post_commit_list)) { + mutex_unlock(&fs_info->fs_devices->device_list_mutex); + mutex_unlock(&fs_info->chunk_mutex); + } else { + break; + } } - ret = btrfs_commit_transaction(trans); - WARN_ON(ret); - /* keep away write_all_supers() during the finishing procedure */ - mutex_lock(&fs_info->fs_devices->device_list_mutex); - mutex_lock(&fs_info->chunk_mutex); down_write(&dev_replace->rwsem); dev_replace->replace_state = scrub_ret ? BTRFS_IOCTL_DEV_REPLACE_STATE_CANCELED @@ -662,7 +678,6 @@ static int btrfs_dev_replace_finishing(struct btrfs_fs_info *fs_info, btrfs_device_set_disk_total_bytes(tgt_device, src_device->disk_total_bytes); btrfs_device_set_bytes_used(tgt_device, src_device->bytes_used); - ASSERT(list_empty(&src_device->post_commit_list)); tgt_device->commit_total_bytes = src_device->commit_total_bytes; tgt_device->commit_bytes_used = src_device->bytes_used;

6 years, 7 months

+ mm-mmu_gather-remove-__tlb_reset_range-for-force-flush.patch added to -mm tree

by akpm＠linux-foundation.org

The patch titled Subject: mm: mmu_gather: remove __tlb_reset_range() for force flush has been added to the -mm tree. Its filename is mm-mmu_gather-remove-__tlb_reset_range-for-force-flush.patch This patch should soon appear at http://ozlabs.org/~akpm/mmots/broken-out/mm-mmu_gather-remove-__tlb_reset_r… and later at http://ozlabs.org/~akpm/mmotm/broken-out/mm-mmu_gather-remove-__tlb_reset_r… Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next and is updated there every 3-4 working days ------------------------------------------------------ From: Yang Shi <yang.shi(a)linux.alibaba.com> Subject: mm: mmu_gather: remove __tlb_reset_range() for force flush A few new fields were added to mmu_gather to make TLB flush smarter for huge page by telling what level of page table is changed. __tlb_reset_range() is used to reset all these page table state to unchanged, which is called by TLB flush for parallel mapping changes for the same range under non-exclusive lock (i.e. read mmap_sem). Before commit dd2283f2605e ("mm: mmap: zap pages with read mmap_sem in munmap"), the syscalls (e.g. MADV_DONTNEED, MADV_FREE) which may update PTEs in parallel don't remove page tables. But, the forementioned commit may do munmap() under read mmap_sem and free page tables. This may result in program hang on aarch64 reported by Jan Stancek. The problem could be reproduced by his test program with slightly modified below. ---8<--- static int map_size = 4096; static int num_iter = 500; static long threads_total; static void *distant_area; void *map_write_unmap(void *ptr) { int *fd = ptr; unsigned char *map_address; int i, j = 0; for (i = 0; i < num_iter; i++) { map_address = mmap(distant_area, (size_t) map_size, PROT_WRITE | PROT_READ, MAP_SHARED | MAP_ANONYMOUS, -1, 0); if (map_address == MAP_FAILED) { perror("mmap"); exit(1); } for (j = 0; j < map_size; j++) map_address[j] = 'b'; if (munmap(map_address, map_size) == -1) { perror("munmap"); exit(1); } } return NULL; } void *dummy(void *ptr) { return NULL; } int main(void) { pthread_t thid[2]; /* hint for mmap in map_write_unmap() */ distant_area = mmap(0, DISTANT_MMAP_SIZE, PROT_WRITE | PROT_READ, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0); munmap(distant_area, (size_t)DISTANT_MMAP_SIZE); distant_area += DISTANT_MMAP_SIZE / 2; while (1) { pthread_create(&thid[0], NULL, map_write_unmap, NULL); pthread_create(&thid[1], NULL, dummy, NULL); pthread_join(thid[0], NULL); pthread_join(thid[1], NULL); } } ---8<--- The program may bring in parallel execution like below: t1 t2 munmap(map_address) downgrade_write(&mm->mmap_sem); unmap_region() tlb_gather_mmu() inc_tlb_flush_pending(tlb->mm); free_pgtables() tlb->freed_tables = 1 tlb->cleared_pmds = 1 pthread_exit() madvise(thread_stack, 8M, MADV_DONTNEED) zap_page_range() tlb_gather_mmu() inc_tlb_flush_pending(tlb->mm); tlb_finish_mmu() if (mm_tlb_flush_nested(tlb->mm)) __tlb_reset_range() __tlb_reset_range() would reset freed_tables and cleared_* bits, but this may cause inconsistency for munmap() which do free page tables. Then it may result in some architectures, e.g. aarch64, may not flush TLB completely as expected to have stale TLB entries remained. Use fullmm flush since it yields much better performance on aarch64 and non-fullmm doesn't yields significant difference on x86. The original proposed fix came from Jan Stancek who mainly debugged this issue, I just wrapped up everything together. Link: http://lkml.kernel.org/r/1558322252-113575-1-git-send-email-yang.shi@linux.… Fixes: dd2283f2605e ("mm: mmap: zap pages with read mmap_sem in munmap") Signed-off-by: Yang Shi <yang.shi(a)linux.alibaba.com> Signed-off-by: Jan Stancek <jstancek(a)redhat.com> Reported-by: Jan Stancek <jstancek(a)redhat.com> Tested-by: Jan Stancek <jstancek(a)redhat.com> Suggested-by: Will Deacon <will.deacon(a)arm.com> Cc: Peter Zijlstra <peterz(a)infradead.org> Cc: Nick Piggin <npiggin(a)gmail.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar(a)linux.ibm.com> Cc: Nadav Amit <namit(a)vmware.com> Cc: Minchan Kim <minchan(a)kernel.org> Cc: Mel Gorman <mgorman(a)suse.de> Cc: <stable(a)vger.kernel.org> [4.20+] Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/mmu_gather.c | 24 +++++++++++++++++++----- 1 file changed, 19 insertions(+), 5 deletions(-) --- a/mm/mmu_gather.c~mm-mmu_gather-remove-__tlb_reset_range-for-force-flush +++ a/mm/mmu_gather.c @@ -245,14 +245,28 @@ void tlb_finish_mmu(struct mmu_gather *t { /* * If there are parallel threads are doing PTE changes on same range - * under non-exclusive lock(e.g., mmap_sem read-side) but defer TLB - * flush by batching, a thread has stable TLB entry can fail to flush - * the TLB by observing pte_none|!pte_dirty, for example so flush TLB - * forcefully if we detect parallel PTE batching threads. + * under non-exclusive lock (e.g., mmap_sem read-side) but defer TLB + * flush by batching, one thread may end up seeing inconsistent PTEs + * and result in having stale TLB entries. So flush TLB forcefully + * if we detect parallel PTE batching threads. + * + * However, some syscalls, e.g. munmap(), may free page tables, this + * needs force flush everything in the given range. Otherwise this + * may result in having stale TLB entries for some architectures, + * e.g. aarch64, that could specify flush what level TLB. */ if (mm_tlb_flush_nested(tlb->mm)) { + /* + * The aarch64 yields better performance with fullmm by + * avoiding multiple CPUs spamming TLBI messages at the + * same time. + * + * On x86 non-fullmm doesn't yield significant difference + * against fullmm. + */ + tlb->fullmm = 1; __tlb_reset_range(tlb); - __tlb_adjust_range(tlb, start, end - start); + tlb->freed_tables = 1; } tlb_flush_mmu(tlb); _ Patches currently in -mm which might be from yang.shi(a)linux.alibaba.com are mm-mmu_gather-remove-__tlb_reset_range-for-force-flush.patch

6 years, 7 months

FAILED: patch "[PATCH] Btrfs: incremental send, fix file corruption when no-holes" failed to apply to 4.4-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.4-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From 6b1f72e5b82a5c2a4da4d1ebb8cc01913ddbea21 Mon Sep 17 00:00:00 2001 From: Filipe Manana <fdmanana(a)suse.com> Date: Mon, 20 May 2019 09:55:42 +0100 Subject: [PATCH] Btrfs: incremental send, fix file corruption when no-holes feature is enabled When using the no-holes feature, if we have a file with prealloc extents with a start offset beyond the file's eof, doing an incremental send can cause corruption of the file due to incorrect hole detection. Such case requires that the prealloc extent(s) exist in both the parent and send snapshots, and that a hole is punched into the file that covers all its extents that do not cross the eof boundary. Example reproducer: $ mkfs.btrfs -f -O no-holes /dev/sdb $ mount /dev/sdb /mnt/sdb $ xfs_io -f -c "pwrite -S 0xab 0 500K" /mnt/sdb/foobar $ xfs_io -c "falloc -k 1200K 800K" /mnt/sdb/foobar $ btrfs subvolume snapshot -r /mnt/sdb /mnt/sdb/base $ btrfs send -f /tmp/base.snap /mnt/sdb/base $ xfs_io -c "fpunch 0 500K" /mnt/sdb/foobar $ btrfs subvolume snapshot -r /mnt/sdb /mnt/sdb/incr $ btrfs send -p /mnt/sdb/base -f /tmp/incr.snap /mnt/sdb/incr $ md5sum /mnt/sdb/incr/foobar 816df6f64deba63b029ca19d880ee10a /mnt/sdb/incr/foobar $ mkfs.btrfs -f /dev/sdc $ mount /dev/sdc /mnt/sdc $ btrfs receive -f /tmp/base.snap /mnt/sdc $ btrfs receive -f /tmp/incr.snap /mnt/sdc $ md5sum /mnt/sdc/incr/foobar cf2ef71f4a9e90c2f6013ba3b2257ed2 /mnt/sdc/incr/foobar --> Different checksum, because the prealloc extent beyond the file's eof confused the hole detection code and it assumed a hole starting at offset 0 and ending at the offset of the prealloc extent (1200Kb) instead of ending at the offset 500Kb (the file's size). Fix this by ensuring we never cross the file's size when issuing the write operations for a hole. Fixes: 16e7549f045d33 ("Btrfs: incompatible format change to remove hole extents") CC: stable(a)vger.kernel.org # 3.14+ Signed-off-by: Filipe Manana <fdmanana(a)suse.com> Signed-off-by: David Sterba <dsterba(a)suse.com> diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c index dd38dfe174df..ca30b1f06e2d 100644 --- a/fs/btrfs/send.c +++ b/fs/btrfs/send.c @@ -4999,6 +4999,12 @@ static int send_hole(struct send_ctx *sctx, u64 end) if (offset >= sctx->cur_inode_size) return 0; + /* + * Don't go beyond the inode's i_size due to prealloc extents that start + * after the i_size. + */ + end = min_t(u64, end, sctx->cur_inode_size); + if (sctx->flags & BTRFS_SEND_FLAG_NO_FILE_DATA) return send_update_extent(sctx, offset, end - offset);

6 years, 7 months

FAILED: patch "[PATCH] Btrfs: incremental send, fix file corruption when no-holes" failed to apply to 4.9-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.9-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From 6b1f72e5b82a5c2a4da4d1ebb8cc01913ddbea21 Mon Sep 17 00:00:00 2001 From: Filipe Manana <fdmanana(a)suse.com> Date: Mon, 20 May 2019 09:55:42 +0100 Subject: [PATCH] Btrfs: incremental send, fix file corruption when no-holes feature is enabled When using the no-holes feature, if we have a file with prealloc extents with a start offset beyond the file's eof, doing an incremental send can cause corruption of the file due to incorrect hole detection. Such case requires that the prealloc extent(s) exist in both the parent and send snapshots, and that a hole is punched into the file that covers all its extents that do not cross the eof boundary. Example reproducer: $ mkfs.btrfs -f -O no-holes /dev/sdb $ mount /dev/sdb /mnt/sdb $ xfs_io -f -c "pwrite -S 0xab 0 500K" /mnt/sdb/foobar $ xfs_io -c "falloc -k 1200K 800K" /mnt/sdb/foobar $ btrfs subvolume snapshot -r /mnt/sdb /mnt/sdb/base $ btrfs send -f /tmp/base.snap /mnt/sdb/base $ xfs_io -c "fpunch 0 500K" /mnt/sdb/foobar $ btrfs subvolume snapshot -r /mnt/sdb /mnt/sdb/incr $ btrfs send -p /mnt/sdb/base -f /tmp/incr.snap /mnt/sdb/incr $ md5sum /mnt/sdb/incr/foobar 816df6f64deba63b029ca19d880ee10a /mnt/sdb/incr/foobar $ mkfs.btrfs -f /dev/sdc $ mount /dev/sdc /mnt/sdc $ btrfs receive -f /tmp/base.snap /mnt/sdc $ btrfs receive -f /tmp/incr.snap /mnt/sdc $ md5sum /mnt/sdc/incr/foobar cf2ef71f4a9e90c2f6013ba3b2257ed2 /mnt/sdc/incr/foobar --> Different checksum, because the prealloc extent beyond the file's eof confused the hole detection code and it assumed a hole starting at offset 0 and ending at the offset of the prealloc extent (1200Kb) instead of ending at the offset 500Kb (the file's size). Fix this by ensuring we never cross the file's size when issuing the write operations for a hole. Fixes: 16e7549f045d33 ("Btrfs: incompatible format change to remove hole extents") CC: stable(a)vger.kernel.org # 3.14+ Signed-off-by: Filipe Manana <fdmanana(a)suse.com> Signed-off-by: David Sterba <dsterba(a)suse.com> diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c index dd38dfe174df..ca30b1f06e2d 100644 --- a/fs/btrfs/send.c +++ b/fs/btrfs/send.c @@ -4999,6 +4999,12 @@ static int send_hole(struct send_ctx *sctx, u64 end) if (offset >= sctx->cur_inode_size) return 0; + /* + * Don't go beyond the inode's i_size due to prealloc extents that start + * after the i_size. + */ + end = min_t(u64, end, sctx->cur_inode_size); + if (sctx->flags & BTRFS_SEND_FLAG_NO_FILE_DATA) return send_update_extent(sctx, offset, end - offset);

6 years, 7 months

FAILED: patch "[PATCH] Btrfs: fix fsync not persisting changed attributes of a" failed to apply to 4.4-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.4-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From 60d9f50308e5df19bc18c2fefab0eba4a843900a Mon Sep 17 00:00:00 2001 From: Filipe Manana <fdmanana(a)suse.com> Date: Thu, 16 May 2019 15:48:55 +0100 Subject: [PATCH] Btrfs: fix fsync not persisting changed attributes of a directory While logging an inode we follow its ancestors and for each one we mark it as logged in the current transaction, even if we have not logged it. As a consequence if we change an attribute of an ancestor, such as the UID or GID for example, and then explicitly fsync it, we end up not logging the inode at all despite returning success to user space, which results in the attribute being lost if a power failure happens after the fsync. Sample reproducer: $ mkfs.btrfs -f /dev/sdb $ mount /dev/sdb /mnt $ mkdir /mnt/dir $ chown 6007:6007 /mnt/dir $ sync $ chown 9003:9003 /mnt/dir $ touch /mnt/dir/file $ xfs_io -c fsync /mnt/dir/file # fsync our directory after fsync'ing the new file, should persist the # new values for the uid and gid. $ xfs_io -c fsync /mnt/dir <power failure> $ mount /dev/sdb /mnt $ stat -c %u:%g /mnt/dir 6007:6007 --> should be 9003:9003, the uid and gid were not persisted, despite the explicit fsync on the directory prior to the power failure Fix this by not updating the logged_trans field of ancestor inodes when logging an inode, since we have not logged them. Let only future calls to btrfs_log_inode() to mark inodes as logged. This could be triggered by my recent fsync fuzz tester for fstests, for which an fstests patch exists titled "fstests: generic, fsync fuzz tester with fsstress". Fixes: 12fcfd22fe5b ("Btrfs: tree logging unlink/rename fixes") CC: stable(a)vger.kernel.org # 4.4+ Signed-off-by: Filipe Manana <fdmanana(a)suse.com> Signed-off-by: David Sterba <dsterba(a)suse.com> diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c index 6c47f6ed3e94..de729acee738 100644 --- a/fs/btrfs/tree-log.c +++ b/fs/btrfs/tree-log.c @@ -5478,7 +5478,6 @@ static noinline int check_parent_dirs_for_sync(struct btrfs_trans_handle *trans, { int ret = 0; struct dentry *old_parent = NULL; - struct btrfs_inode *orig_inode = inode; /* * for regular files, if its inode is already on disk, we don't @@ -5498,16 +5497,6 @@ static noinline int check_parent_dirs_for_sync(struct btrfs_trans_handle *trans, } while (1) { - /* - * If we are logging a directory then we start with our inode, - * not our parent's inode, so we need to skip setting the - * logged_trans so that further down in the log code we don't - * think this inode has already been logged. - */ - if (inode != orig_inode) - inode->logged_trans = trans->transid; - smp_mb(); - if (btrfs_must_commit_transaction(trans, inode)) { ret = 1; break; @@ -6384,7 +6373,6 @@ void btrfs_record_unlink_dir(struct btrfs_trans_handle *trans, * if this directory was already logged any new * names for this file/dir will get recorded */ - smp_mb(); if (dir->logged_trans == trans->transid) return;

6 years, 7 months

FAILED: patch "[PATCH] Btrfs: fix fsync not persisting changed attributes of a" failed to apply to 4.9-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.9-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From 60d9f50308e5df19bc18c2fefab0eba4a843900a Mon Sep 17 00:00:00 2001 From: Filipe Manana <fdmanana(a)suse.com> Date: Thu, 16 May 2019 15:48:55 +0100 Subject: [PATCH] Btrfs: fix fsync not persisting changed attributes of a directory While logging an inode we follow its ancestors and for each one we mark it as logged in the current transaction, even if we have not logged it. As a consequence if we change an attribute of an ancestor, such as the UID or GID for example, and then explicitly fsync it, we end up not logging the inode at all despite returning success to user space, which results in the attribute being lost if a power failure happens after the fsync. Sample reproducer: $ mkfs.btrfs -f /dev/sdb $ mount /dev/sdb /mnt $ mkdir /mnt/dir $ chown 6007:6007 /mnt/dir $ sync $ chown 9003:9003 /mnt/dir $ touch /mnt/dir/file $ xfs_io -c fsync /mnt/dir/file # fsync our directory after fsync'ing the new file, should persist the # new values for the uid and gid. $ xfs_io -c fsync /mnt/dir <power failure> $ mount /dev/sdb /mnt $ stat -c %u:%g /mnt/dir 6007:6007 --> should be 9003:9003, the uid and gid were not persisted, despite the explicit fsync on the directory prior to the power failure Fix this by not updating the logged_trans field of ancestor inodes when logging an inode, since we have not logged them. Let only future calls to btrfs_log_inode() to mark inodes as logged. This could be triggered by my recent fsync fuzz tester for fstests, for which an fstests patch exists titled "fstests: generic, fsync fuzz tester with fsstress". Fixes: 12fcfd22fe5b ("Btrfs: tree logging unlink/rename fixes") CC: stable(a)vger.kernel.org # 4.4+ Signed-off-by: Filipe Manana <fdmanana(a)suse.com> Signed-off-by: David Sterba <dsterba(a)suse.com> diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c index 6c47f6ed3e94..de729acee738 100644 --- a/fs/btrfs/tree-log.c +++ b/fs/btrfs/tree-log.c @@ -5478,7 +5478,6 @@ static noinline int check_parent_dirs_for_sync(struct btrfs_trans_handle *trans, { int ret = 0; struct dentry *old_parent = NULL; - struct btrfs_inode *orig_inode = inode; /* * for regular files, if its inode is already on disk, we don't @@ -5498,16 +5497,6 @@ static noinline int check_parent_dirs_for_sync(struct btrfs_trans_handle *trans, } while (1) { - /* - * If we are logging a directory then we start with our inode, - * not our parent's inode, so we need to skip setting the - * logged_trans so that further down in the log code we don't - * think this inode has already been logged. - */ - if (inode != orig_inode) - inode->logged_trans = trans->transid; - smp_mb(); - if (btrfs_must_commit_transaction(trans, inode)) { ret = 1; break; @@ -6384,7 +6373,6 @@ void btrfs_record_unlink_dir(struct btrfs_trans_handle *trans, * if this directory was already logged any new * names for this file/dir will get recorded */ - smp_mb(); if (dir->logged_trans == trans->transid) return;

6 years, 7 months

FAILED: patch "[PATCH] Btrfs: fix wrong ctime and mtime of a directory after log" failed to apply to 4.4-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.4-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From 5338e43abbab13791144d37fd8846847062351c6 Mon Sep 17 00:00:00 2001 From: Filipe Manana <fdmanana(a)suse.com> Date: Wed, 15 May 2019 16:02:47 +0100 Subject: [PATCH] Btrfs: fix wrong ctime and mtime of a directory after log replay When replaying a log that contains a new file or directory name that needs to be added to its parent directory, we end up updating the mtime and the ctime of the parent directory to the current time after we have set their values to the correct ones (set at fsync time), efectivelly losing them. Sample reproducer: $ mkfs.btrfs -f /dev/sdb $ mount /dev/sdb /mnt $ mkdir /mnt/dir $ touch /mnt/dir/file # fsync of the directory is optional, not needed $ xfs_io -c fsync /mnt/dir $ xfs_io -c fsync /mnt/dir/file $ stat -c %Y /mnt/dir 1557856079 <power failure> $ sleep 3 $ mount /dev/sdb /mnt $ stat -c %Y /mnt/dir 1557856082 --> should have been 1557856079, the mtime is updated to the current time when replaying the log Fix this by not updating the mtime and ctime to the current time at btrfs_add_link() when we are replaying a log tree. This could be triggered by my recent fsync fuzz tester for fstests, for which an fstests patch exists titled "fstests: generic, fsync fuzz tester with fsstress". Fixes: e02119d5a7b43 ("Btrfs: Add a write ahead tree log to optimize synchronous operations") CC: stable(a)vger.kernel.org # 4.4+ Reviewed-by: Nikolay Borisov <nborisov(a)suse.com> Signed-off-by: Filipe Manana <fdmanana(a)suse.com> Signed-off-by: David Sterba <dsterba(a)suse.com> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index b6d549c993f6..6bebc0ca751d 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -6433,8 +6433,18 @@ int btrfs_add_link(struct btrfs_trans_handle *trans, btrfs_i_size_write(parent_inode, parent_inode->vfs_inode.i_size + name_len * 2); inode_inc_iversion(&parent_inode->vfs_inode); - parent_inode->vfs_inode.i_mtime = parent_inode->vfs_inode.i_ctime = - current_time(&parent_inode->vfs_inode); + /* + * If we are replaying a log tree, we do not want to update the mtime + * and ctime of the parent directory with the current time, since the + * log replay procedure is responsible for setting them to their correct + * values (the ones it had when the fsync was done). + */ + if (!test_bit(BTRFS_FS_LOG_RECOVERING, &root->fs_info->flags)) { + struct timespec64 now = current_time(&parent_inode->vfs_inode); + + parent_inode->vfs_inode.i_mtime = now; + parent_inode->vfs_inode.i_ctime = now; + } ret = btrfs_update_inode(trans, root, &parent_inode->vfs_inode); if (ret) btrfs_abort_transaction(trans, ret);

6 years, 7 months

FAILED: patch "[PATCH] Btrfs: fix wrong ctime and mtime of a directory after log" failed to apply to 4.9-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.9-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From 5338e43abbab13791144d37fd8846847062351c6 Mon Sep 17 00:00:00 2001 From: Filipe Manana <fdmanana(a)suse.com> Date: Wed, 15 May 2019 16:02:47 +0100 Subject: [PATCH] Btrfs: fix wrong ctime and mtime of a directory after log replay When replaying a log that contains a new file or directory name that needs to be added to its parent directory, we end up updating the mtime and the ctime of the parent directory to the current time after we have set their values to the correct ones (set at fsync time), efectivelly losing them. Sample reproducer: $ mkfs.btrfs -f /dev/sdb $ mount /dev/sdb /mnt $ mkdir /mnt/dir $ touch /mnt/dir/file # fsync of the directory is optional, not needed $ xfs_io -c fsync /mnt/dir $ xfs_io -c fsync /mnt/dir/file $ stat -c %Y /mnt/dir 1557856079 <power failure> $ sleep 3 $ mount /dev/sdb /mnt $ stat -c %Y /mnt/dir 1557856082 --> should have been 1557856079, the mtime is updated to the current time when replaying the log Fix this by not updating the mtime and ctime to the current time at btrfs_add_link() when we are replaying a log tree. This could be triggered by my recent fsync fuzz tester for fstests, for which an fstests patch exists titled "fstests: generic, fsync fuzz tester with fsstress". Fixes: e02119d5a7b43 ("Btrfs: Add a write ahead tree log to optimize synchronous operations") CC: stable(a)vger.kernel.org # 4.4+ Reviewed-by: Nikolay Borisov <nborisov(a)suse.com> Signed-off-by: Filipe Manana <fdmanana(a)suse.com> Signed-off-by: David Sterba <dsterba(a)suse.com> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index b6d549c993f6..6bebc0ca751d 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -6433,8 +6433,18 @@ int btrfs_add_link(struct btrfs_trans_handle *trans, btrfs_i_size_write(parent_inode, parent_inode->vfs_inode.i_size + name_len * 2); inode_inc_iversion(&parent_inode->vfs_inode); - parent_inode->vfs_inode.i_mtime = parent_inode->vfs_inode.i_ctime = - current_time(&parent_inode->vfs_inode); + /* + * If we are replaying a log tree, we do not want to update the mtime + * and ctime of the parent directory with the current time, since the + * log replay procedure is responsible for setting them to their correct + * values (the ones it had when the fsync was done). + */ + if (!test_bit(BTRFS_FS_LOG_RECOVERING, &root->fs_info->flags)) { + struct timespec64 now = current_time(&parent_inode->vfs_inode); + + parent_inode->vfs_inode.i_mtime = now; + parent_inode->vfs_inode.i_ctime = now; + } ret = btrfs_update_inode(trans, root, &parent_inode->vfs_inode); if (ret) btrfs_abort_transaction(trans, ret);

6 years, 7 months

Linux 5.1.7

by Greg KH

I'm announcing the release of the 5.1.7 kernel. All users of the 5.1 kernel series must upgrade. The updated 5.1.y git tree can be found at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-5.1.y and can be browsed at the normal kernel.org git web browser: https://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary thanks, greg k-h ------------ Makefile | 2 drivers/crypto/vmx/ghash.c | 212 ++++++----------- drivers/net/bonding/bond_main.c | 15 - drivers/net/dsa/mv88e6xxx/chip.c | 2 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 19 - drivers/net/ethernet/broadcom/bnxt/bnxt.h | 6 drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c | 2 drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c | 2 drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c | 5 drivers/net/ethernet/chelsio/cxgb4/t4_hw.c | 11 drivers/net/ethernet/freescale/fec_main.c | 2 drivers/net/ethernet/marvell/mvneta.c | 4 drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c | 10 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 13 + drivers/net/ethernet/mellanox/mlx5/core/fs_core.c | 6 drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_erp.c | 11 drivers/net/ethernet/realtek/r8169.c | 3 drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c | 4 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 8 drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c | 3 drivers/net/phy/marvell10g.c | 13 + drivers/net/usb/usbnet.c | 6 include/linux/siphash.h | 5 include/net/netns/ipv4.h | 2 include/uapi/linux/tipc_config.h | 10 net/core/dev.c | 2 net/core/ethtool.c | 8 net/core/skbuff.c | 6 net/ipv4/igmp.c | 47 ++- net/ipv4/ip_output.c | 4 net/ipv4/route.c | 12 net/ipv6/ip6_output.c | 4 net/ipv6/output_core.c | 30 +- net/ipv6/raw.c | 2 net/ipv6/route.c | 6 net/llc/llc_output.c | 2 net/sched/act_api.c | 3 net/tipc/core.c | 32 +- net/tipc/subscr.h | 5 net/tipc/topsrv.c | 14 - net/tls/tls_device.c | 9 net/tls/tls_sw.c | 19 - tools/testing/selftests/net/tls.c | 34 ++ 43 files changed, 359 insertions(+), 256 deletions(-) Andy Duan (1): net: fec: fix the clk mismatch in failed_reset path Antoine Tenart (1): net: mvpp2: fix bad MVPP2_TXQ_SCHED_TOKEN_CNTR_REG queue value Chris Packham (1): tipc: Avoid copying bytes beyond the supplied data Daniel Axtens (1): crypto: vmx - ghash: do nosimd fallback manually David Ahern (1): ipv6: Fix redirect with VRF David S. Miller (1): Revert "tipc: fix modprobe tipc failed after switch order of device registration" Eric Dumazet (5): inet: switch IP ID generator to siphash ipv4/igmp: fix another memory leak in igmpv3_del_delrec() ipv4/igmp: fix build error if !CONFIG_IP_MULTICAST llc: fix skb leak in llc_build_and_send_ui_pkt() net-gro: fix use-after-free read in napi_gro_frags() Greg Kroah-Hartman (1): Linux 5.1.7 Heiner Kallweit (1): r8169: fix MAC address being lost in PCI D3 Jakub Kicinski (6): net/tls: fix lowat calculation if some data came from previous record selftests/tls: test for lowat overshoot with multiple records net/tls: fix no wakeup on partial reads selftests/tls: add test for sleeping even though there is data net/tls: fix state removal with feature flags off net/tls: don't ignore netdev notifications if no TLS features Jarod Wilson (1): bonding/802.3ad: fix slave link initialization transition states Jiri Pirko (1): mlxsw: spectrum_acl: Avoid warning after identical rules insertion Jisheng Zhang (2): net: mvneta: Fix err code path of probe net: stmmac: fix reset gpio free missing Junwei Hu (1): tipc: fix modprobe tipc failed after switch order of device registration Kloetzke Jan (1): usbnet: fix kernel crash after disconnect Maxime Chevallier (1): ethtool: Check for vlan etype or vlan tci when parsing flow_rule Michael Chan (3): bnxt_en: Fix aggregation buffer leak under OOM condition. bnxt_en: Fix possible BUG() condition when calling pci_disable_msix(). bnxt_en: Reduce memory usage when running in kdump kernel. Mike Manning (1): ipv6: Consider sk_bound_dev_if when binding a raw socket to an address Parav Pandit (2): net/mlx5: Avoid double free in fs init error unwinding path net/mlx5: Allocate root ns memory using kzalloc to match kfree Raju Rangoju (1): cxgb4: offload VLAN flows regardless of VLAN ethtype Rasmus Villemoes (1): net: dsa: mv88e6xxx: fix handling of upper half of STATS_TYPE_PORT Russell King (1): net: phy: marvell10g: report if the PHY fails to boot firmware Saeed Mahameed (1): net/mlx5e: Disable rxhash when CQE compress is enabled Tan, Tee Min (1): net: stmmac: fix ethtool flow control not able to get/set Vishal Kulkarni (1): cxgb4: Revert "cxgb4: Remove SGE_HOST_PAGE_SIZE dependency on page size" Vlad Buslov (1): net: sched: don't use tc_action->order during action dump Weifeng Voon (1): net: stmmac: dma channel control register need to be init first Willem de Bruijn (1): net: correct zerocopy refcnt with udp MSG_MORE

6 years, 7 months

Jump to page:

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-stable-mirror June 2019