April 2024 - Linux-stable-mirror

FAILED: patch "[PATCH] btrfs: fix race when detecting delalloc ranges during fiemap" failed to apply to 6.7-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 6.7-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.7.y git checkout FETCH_HEAD git cherry-pick -x 978b63f7464abcfd364a6c95f734282c50f3decf # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024040118-pebble-afoot-d19f@gregkh' --subject-prefix 'PATCH 6.7.y' HEAD^.. Possible dependencies: 978b63f7464a ("btrfs: fix race when detecting delalloc ranges during fiemap") 418b09027743 ("btrfs: ensure fiemap doesn't race with writes when FIEMAP_FLAG_SYNC is given") a1a4a9ca77f1 ("btrfs: fix race between ordered extent completion and fiemap") b0ad381fa769 ("btrfs: fix deadlock with fiemap and extent locking") thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 978b63f7464abcfd364a6c95f734282c50f3decf Mon Sep 17 00:00:00 2001 From: Filipe Manana <fdmanana(a)suse.com> Date: Wed, 28 Feb 2024 11:37:56 +0000 Subject: [PATCH] btrfs: fix race when detecting delalloc ranges during fiemap For fiemap we recently stopped locking the target extent range for the whole duration of the fiemap call, in order to avoid a deadlock in a scenario where the fiemap buffer happens to be a memory mapped range of the same file. This use case is very unlikely to be useful in practice but it may be triggered by fuzz testing (syzbot, etc). This however introduced a race that makes us miss delalloc ranges for file regions that are currently holes, so the caller of fiemap will not be aware that there's data for some file regions. This can be quite serious for some use cases - for example in coreutils versions before 9.0, the cp program used fiemap to detect holes and data in the source file, copying only regions with data (extents or delalloc) from the source file to the destination file in order to preserve holes (see the documentation for its --sparse command line option). This means that if cp was used with a source file that had delalloc in a hole, the destination file could end up without that data, which is effectively a data loss issue, if it happened to hit the race described below. The race happens like this: 1) Fiemap is called, without the FIEMAP_FLAG_SYNC flag, for a file that has delalloc in the file range [64M, 65M[, which is currently a hole; 2) Fiemap locks the inode in shared mode, then starts iterating the inode's subvolume tree searching for file extent items, without having the whole fiemap target range locked in the inode's io tree - the change introduced recently by commit b0ad381fa769 ("btrfs: fix deadlock with fiemap and extent locking"). It only locks ranges in the io tree when it finds a hole or prealloc extent since that commit; 3) Note that fiemap clones each leaf before using it, and this is to avoid deadlocks when locking a file range in the inode's io tree and the fiemap buffer is memory mapped to some file, because writing to the page with btrfs_page_mkwrite() will wait on any ordered extent for the page's range and the ordered extent needs to lock the range and may need to modify the same leaf, therefore leading to a deadlock on the leaf; 4) While iterating the file extent items in the cloned leaf before finding the hole in the range [64M, 65M[, the delalloc in that range is flushed and its ordered extent completes - meaning the corresponding file extent item is in the inode's subvolume tree, but not present in the cloned leaf that fiemap is iterating over; 5) When fiemap finds the hole in the [64M, 65M[ range by seeing the gap in the cloned leaf (or a file extent item with disk_bytenr == 0 in case the NO_HOLES feature is not enabled), it will lock that file range in the inode's io tree and then search for delalloc by checking for the EXTENT_DELALLOC bit in the io tree for that range and ordered extents (with btrfs_find_delalloc_in_range()). But it finds nothing since the delalloc in that range was already flushed and the ordered extent completed and is gone - as a result fiemap will not report that there's delalloc or an extent for the range [64M, 65M[, so user space will be mislead into thinking that there's a hole in that range. This could actually be sporadically triggered with test case generic/094 from fstests, which reports a missing extent/delalloc range like this: generic/094 2s ... - output mismatch (see /home/fdmanana/git/hub/xfstests/results//generic/094.out.bad) --- tests/generic/094.out 2020-06-10 19:29:03.830519425 +0100 +++ /home/fdmanana/git/hub/xfstests/results//generic/094.out.bad 2024-02-28 11:00:00.381071525 +0000 @@ -1,3 +1,9 @@ QA output created by 094 fiemap run with sync fiemap run without sync +ERROR: couldn't find extent at 7 +map is 'HHDDHPPDPHPH' +logical: [ 5.. 6] phys: 301517.. 301518 flags: 0x800 tot: 2 +logical: [ 8.. 8] phys: 301520.. 301520 flags: 0x800 tot: 1 ... (Run 'diff -u /home/fdmanana/git/hub/xfstests/tests/generic/094.out /home/fdmanana/git/hub/xfstests/results//generic/094.out.bad' to see the entire diff) So in order to fix this, while still avoiding deadlocks in the case where the fiemap buffer is memory mapped to the same file, change fiemap to work like the following: 1) Always lock the whole range in the inode's io tree before starting to iterate the inode's subvolume tree searching for file extent items, just like we did before commit b0ad381fa769 ("btrfs: fix deadlock with fiemap and extent locking"); 2) Now instead of writing to the fiemap buffer every time we have an extent to report, write instead to a temporary buffer (1 page), and when that buffer becomes full, stop iterating the file extent items, unlock the range in the io tree, release the search path, submit all the entries kept in that buffer to the fiemap buffer, and then resume the search for file extent items after locking again the remainder of the range in the io tree. The buffer having a size of a page, allows for 146 entries in a system with 4K pages. This is a large enough value to have a good performance by avoiding too many restarts of the search for file extent items. In other words this preserves the huge performance gains made in the last two years to fiemap, while avoiding the deadlocks in case the fiemap buffer is memory mapped to the same file (useless in practice, but possible and exercised by fuzz testing and syzbot). Fixes: b0ad381fa769 ("btrfs: fix deadlock with fiemap and extent locking") Reviewed-by: Josef Bacik <josef(a)toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana(a)suse.com> Signed-off-by: David Sterba <dsterba(a)suse.com> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index e6a2b6eb89e1..fbb05b0f7ebc 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -2453,12 +2453,65 @@ int try_release_extent_mapping(struct page *page, gfp_t mask) return try_release_extent_state(tree, page, mask); } +struct btrfs_fiemap_entry { + u64 offset; + u64 phys; + u64 len; + u32 flags; +}; + /* - * To cache previous fiemap extent + * Indicate the caller of emit_fiemap_extent() that it needs to unlock the file + * range from the inode's io tree, unlock the subvolume tree search path, flush + * the fiemap cache and relock the file range and research the subvolume tree. + * The value here is something negative that can't be confused with a valid + * errno value and different from 1 because that's also a return value from + * fiemap_fill_next_extent() and also it's often used to mean some btree search + * did not find a key, so make it some distinct negative value. + */ +#define BTRFS_FIEMAP_FLUSH_CACHE (-(MAX_ERRNO + 1)) + +/* + * Used to: * - * Will be used for merging fiemap extent + * - Cache the next entry to be emitted to the fiemap buffer, so that we can + * merge extents that are contiguous and can be grouped as a single one; + * + * - Store extents ready to be written to the fiemap buffer in an intermediary + * buffer. This intermediary buffer is to ensure that in case the fiemap + * buffer is memory mapped to the fiemap target file, we don't deadlock + * during btrfs_page_mkwrite(). This is because during fiemap we are locking + * an extent range in order to prevent races with delalloc flushing and + * ordered extent completion, which is needed in order to reliably detect + * delalloc in holes and prealloc extents. And this can lead to a deadlock + * if the fiemap buffer is memory mapped to the file we are running fiemap + * against (a silly, useless in practice scenario, but possible) because + * btrfs_page_mkwrite() will try to lock the same extent range. */ struct fiemap_cache { + /* An array of ready fiemap entries. */ + struct btrfs_fiemap_entry *entries; + /* Number of entries in the entries array. */ + int entries_size; + /* Index of the next entry in the entries array to write to. */ + int entries_pos; + /* + * Once the entries array is full, this indicates what's the offset for + * the next file extent item we must search for in the inode's subvolume + * tree after unlocking the extent range in the inode's io tree and + * releasing the search path. + */ + u64 next_search_offset; + /* + * This matches struct fiemap_extent_info::fi_mapped_extents, we use it + * to count ourselves emitted extents and stop instead of relying on + * fiemap_fill_next_extent() because we buffer ready fiemap entries at + * the @entries array, and we want to stop as soon as we hit the max + * amount of extents to map, not just to save time but also to make the + * logic at extent_fiemap() simpler. + */ + unsigned int extents_mapped; + /* Fields for the cached extent (unsubmitted, not ready, extent). */ u64 offset; u64 phys; u64 len; @@ -2466,6 +2519,28 @@ struct fiemap_cache { bool cached; }; +static int flush_fiemap_cache(struct fiemap_extent_info *fieinfo, + struct fiemap_cache *cache) +{ + for (int i = 0; i < cache->entries_pos; i++) { + struct btrfs_fiemap_entry *entry = &cache->entries[i]; + int ret; + + ret = fiemap_fill_next_extent(fieinfo, entry->offset, + entry->phys, entry->len, + entry->flags); + /* + * Ignore 1 (reached max entries) because we keep track of that + * ourselves in emit_fiemap_extent(). + */ + if (ret < 0) + return ret; + } + cache->entries_pos = 0; + + return 0; +} + /* * Helper to submit fiemap extent. * @@ -2480,8 +2555,8 @@ static int emit_fiemap_extent(struct fiemap_extent_info *fieinfo, struct fiemap_cache *cache, u64 offset, u64 phys, u64 len, u32 flags) { + struct btrfs_fiemap_entry *entry; u64 cache_end; - int ret = 0; /* Set at the end of extent_fiemap(). */ ASSERT((flags & FIEMAP_EXTENT_LAST) == 0); @@ -2494,7 +2569,9 @@ static int emit_fiemap_extent(struct fiemap_extent_info *fieinfo, * find an extent that starts at an offset behind the end offset of the * previous extent we processed. This happens if fiemap is called * without FIEMAP_FLAG_SYNC and there are ordered extents completing - * while we call btrfs_next_leaf() (through fiemap_next_leaf_item()). + * after we had to unlock the file range, release the search path, emit + * the fiemap extents stored in the buffer (cache->entries array) and + * the lock the remainder of the range and re-search the btree. * * For example we are in leaf X processing its last item, which is the * file extent item for file range [512K, 1M[, and after @@ -2607,11 +2684,35 @@ static int emit_fiemap_extent(struct fiemap_extent_info *fieinfo, emit: /* Not mergeable, need to submit cached one */ - ret = fiemap_fill_next_extent(fieinfo, cache->offset, cache->phys, - cache->len, cache->flags); - cache->cached = false; - if (ret) - return ret; + + if (cache->entries_pos == cache->entries_size) { + /* + * We will need to research for the end offset of the last + * stored extent and not from the current offset, because after + * unlocking the range and releasing the path, if there's a hole + * between that end offset and this current offset, a new extent + * may have been inserted due to a new write, so we don't want + * to miss it. + */ + entry = &cache->entries[cache->entries_size - 1]; + cache->next_search_offset = entry->offset + entry->len; + cache->cached = false; + + return BTRFS_FIEMAP_FLUSH_CACHE; + } + + entry = &cache->entries[cache->entries_pos]; + entry->offset = cache->offset; + entry->phys = cache->phys; + entry->len = cache->len; + entry->flags = cache->flags; + cache->entries_pos++; + cache->extents_mapped++; + + if (cache->extents_mapped == fieinfo->fi_extents_max) { + cache->cached = false; + return 1; + } assign: cache->cached = true; cache->offset = offset; @@ -2737,8 +2838,8 @@ static int fiemap_search_slot(struct btrfs_inode *inode, struct btrfs_path *path * neighbour leaf). * We also need the private clone because holding a read lock on an * extent buffer of the subvolume's b+tree will make lockdep unhappy - * when we call fiemap_fill_next_extent(), because that may cause a page - * fault when filling the user space buffer with fiemap data. + * when we check if extents are shared, as backref walking may need to + * lock the same leaf we are processing. */ clone = btrfs_clone_extent_buffer(path->nodes[0]); if (!clone) @@ -2778,34 +2879,16 @@ static int fiemap_process_hole(struct btrfs_inode *inode, * it beyond i_size. */ while (cur_offset < end && cur_offset < i_size) { - struct extent_state *cached_state = NULL; u64 delalloc_start; u64 delalloc_end; u64 prealloc_start; - u64 lockstart; - u64 lockend; u64 prealloc_len = 0; bool delalloc; - lockstart = round_down(cur_offset, inode->root->fs_info->sectorsize); - lockend = round_up(end, inode->root->fs_info->sectorsize); - - /* - * We are only locking for the delalloc range because that's the - * only thing that can change here. With fiemap we have a lock - * on the inode, so no buffered or direct writes can happen. - * - * However mmaps and normal page writeback will cause this to - * change arbitrarily. We have to lock the extent lock here to - * make sure that nobody messes with the tree while we're doing - * btrfs_find_delalloc_in_range. - */ - lock_extent(&inode->io_tree, lockstart, lockend, &cached_state); delalloc = btrfs_find_delalloc_in_range(inode, cur_offset, end, delalloc_cached_state, &delalloc_start, &delalloc_end); - unlock_extent(&inode->io_tree, lockstart, lockend, &cached_state); if (!delalloc) break; @@ -2973,6 +3056,7 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo, u64 start, u64 len) { const u64 ino = btrfs_ino(inode); + struct extent_state *cached_state = NULL; struct extent_state *delalloc_cached_state = NULL; struct btrfs_path *path; struct fiemap_cache cache = { 0 }; @@ -2985,26 +3069,33 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo, bool stopped = false; int ret; + cache.entries_size = PAGE_SIZE / sizeof(struct btrfs_fiemap_entry); + cache.entries = kmalloc_array(cache.entries_size, + sizeof(struct btrfs_fiemap_entry), + GFP_KERNEL); backref_ctx = btrfs_alloc_backref_share_check_ctx(); path = btrfs_alloc_path(); - if (!backref_ctx || !path) { + if (!cache.entries || !backref_ctx || !path) { ret = -ENOMEM; goto out; } +restart: range_start = round_down(start, sectorsize); range_end = round_up(start + len, sectorsize); prev_extent_end = range_start; + lock_extent(&inode->io_tree, range_start, range_end, &cached_state); + ret = fiemap_find_last_extent_offset(inode, path, &last_extent_end); if (ret < 0) - goto out; + goto out_unlock; btrfs_release_path(path); path->reada = READA_FORWARD; ret = fiemap_search_slot(inode, path, range_start); if (ret < 0) { - goto out; + goto out_unlock; } else if (ret > 0) { /* * No file extent item found, but we may have delalloc between @@ -3051,7 +3142,7 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo, backref_ctx, 0, 0, 0, prev_extent_end, hole_end); if (ret < 0) { - goto out; + goto out_unlock; } else if (ret > 0) { /* fiemap_fill_next_extent() told us to stop. */ stopped = true; @@ -3107,7 +3198,7 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo, extent_gen, backref_ctx); if (ret < 0) - goto out; + goto out_unlock; else if (ret > 0) flags |= FIEMAP_EXTENT_SHARED; } @@ -3118,9 +3209,9 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo, } if (ret < 0) { - goto out; + goto out_unlock; } else if (ret > 0) { - /* fiemap_fill_next_extent() told us to stop. */ + /* emit_fiemap_extent() told us to stop. */ stopped = true; break; } @@ -3129,12 +3220,12 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo, next_item: if (fatal_signal_pending(current)) { ret = -EINTR; - goto out; + goto out_unlock; } ret = fiemap_next_leaf_item(inode, path); if (ret < 0) { - goto out; + goto out_unlock; } else if (ret > 0) { /* No more file extent items for this inode. */ break; @@ -3143,22 +3234,12 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo, } check_eof_delalloc: - /* - * Release (and free) the path before emitting any final entries to - * fiemap_fill_next_extent() to keep lockdep happy. This is because - * once we find no more file extent items exist, we may have a - * non-cloned leaf, and fiemap_fill_next_extent() can trigger page - * faults when copying data to the user space buffer. - */ - btrfs_free_path(path); - path = NULL; - if (!stopped && prev_extent_end < range_end) { ret = fiemap_process_hole(inode, fieinfo, &cache, &delalloc_cached_state, backref_ctx, 0, 0, 0, prev_extent_end, range_end - 1); if (ret < 0) - goto out; + goto out_unlock; prev_extent_end = range_end; } @@ -3166,28 +3247,16 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo, const u64 i_size = i_size_read(&inode->vfs_inode); if (prev_extent_end < i_size) { - struct extent_state *cached_state = NULL; u64 delalloc_start; u64 delalloc_end; - u64 lockstart; - u64 lockend; bool delalloc; - lockstart = round_down(prev_extent_end, sectorsize); - lockend = round_up(i_size, sectorsize); - - /* - * See the comment in fiemap_process_hole as to why - * we're doing the locking here. - */ - lock_extent(&inode->io_tree, lockstart, lockend, &cached_state); delalloc = btrfs_find_delalloc_in_range(inode, prev_extent_end, i_size - 1, &delalloc_cached_state, &delalloc_start, &delalloc_end); - unlock_extent(&inode->io_tree, lockstart, lockend, &cached_state); if (!delalloc) cache.flags |= FIEMAP_EXTENT_LAST; } else { @@ -3195,9 +3264,39 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo, } } +out_unlock: + unlock_extent(&inode->io_tree, range_start, range_end, &cached_state); + + if (ret == BTRFS_FIEMAP_FLUSH_CACHE) { + btrfs_release_path(path); + ret = flush_fiemap_cache(fieinfo, &cache); + if (ret) + goto out; + len -= cache.next_search_offset - start; + start = cache.next_search_offset; + goto restart; + } else if (ret < 0) { + goto out; + } + + /* + * Must free the path before emitting to the fiemap buffer because we + * may have a non-cloned leaf and if the fiemap buffer is memory mapped + * to a file, a write into it (through btrfs_page_mkwrite()) may trigger + * waiting for an ordered extent that in order to complete needs to + * modify that leaf, therefore leading to a deadlock. + */ + btrfs_free_path(path); + path = NULL; + + ret = flush_fiemap_cache(fieinfo, &cache); + if (ret) + goto out; + ret = emit_last_fiemap_cache(fieinfo, &cache); out: free_extent_state(delalloc_cached_state); + kfree(cache.entries); btrfs_free_backref_share_ctx(backref_ctx); btrfs_free_path(path); return ret;

1 year, 8 months

1
0
0 0

FAILED: patch "[PATCH] btrfs: fix race when detecting delalloc ranges during fiemap" failed to apply to 6.6-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 6.6-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y git checkout FETCH_HEAD git cherry-pick -x 978b63f7464abcfd364a6c95f734282c50f3decf # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024040115-paparazzi-shortcut-137f@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^.. Possible dependencies: 978b63f7464a ("btrfs: fix race when detecting delalloc ranges during fiemap") 418b09027743 ("btrfs: ensure fiemap doesn't race with writes when FIEMAP_FLAG_SYNC is given") a1a4a9ca77f1 ("btrfs: fix race between ordered extent completion and fiemap") b0ad381fa769 ("btrfs: fix deadlock with fiemap and extent locking") thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 978b63f7464abcfd364a6c95f734282c50f3decf Mon Sep 17 00:00:00 2001 From: Filipe Manana <fdmanana(a)suse.com> Date: Wed, 28 Feb 2024 11:37:56 +0000 Subject: [PATCH] btrfs: fix race when detecting delalloc ranges during fiemap For fiemap we recently stopped locking the target extent range for the whole duration of the fiemap call, in order to avoid a deadlock in a scenario where the fiemap buffer happens to be a memory mapped range of the same file. This use case is very unlikely to be useful in practice but it may be triggered by fuzz testing (syzbot, etc). This however introduced a race that makes us miss delalloc ranges for file regions that are currently holes, so the caller of fiemap will not be aware that there's data for some file regions. This can be quite serious for some use cases - for example in coreutils versions before 9.0, the cp program used fiemap to detect holes and data in the source file, copying only regions with data (extents or delalloc) from the source file to the destination file in order to preserve holes (see the documentation for its --sparse command line option). This means that if cp was used with a source file that had delalloc in a hole, the destination file could end up without that data, which is effectively a data loss issue, if it happened to hit the race described below. The race happens like this: 1) Fiemap is called, without the FIEMAP_FLAG_SYNC flag, for a file that has delalloc in the file range [64M, 65M[, which is currently a hole; 2) Fiemap locks the inode in shared mode, then starts iterating the inode's subvolume tree searching for file extent items, without having the whole fiemap target range locked in the inode's io tree - the change introduced recently by commit b0ad381fa769 ("btrfs: fix deadlock with fiemap and extent locking"). It only locks ranges in the io tree when it finds a hole or prealloc extent since that commit; 3) Note that fiemap clones each leaf before using it, and this is to avoid deadlocks when locking a file range in the inode's io tree and the fiemap buffer is memory mapped to some file, because writing to the page with btrfs_page_mkwrite() will wait on any ordered extent for the page's range and the ordered extent needs to lock the range and may need to modify the same leaf, therefore leading to a deadlock on the leaf; 4) While iterating the file extent items in the cloned leaf before finding the hole in the range [64M, 65M[, the delalloc in that range is flushed and its ordered extent completes - meaning the corresponding file extent item is in the inode's subvolume tree, but not present in the cloned leaf that fiemap is iterating over; 5) When fiemap finds the hole in the [64M, 65M[ range by seeing the gap in the cloned leaf (or a file extent item with disk_bytenr == 0 in case the NO_HOLES feature is not enabled), it will lock that file range in the inode's io tree and then search for delalloc by checking for the EXTENT_DELALLOC bit in the io tree for that range and ordered extents (with btrfs_find_delalloc_in_range()). But it finds nothing since the delalloc in that range was already flushed and the ordered extent completed and is gone - as a result fiemap will not report that there's delalloc or an extent for the range [64M, 65M[, so user space will be mislead into thinking that there's a hole in that range. This could actually be sporadically triggered with test case generic/094 from fstests, which reports a missing extent/delalloc range like this: generic/094 2s ... - output mismatch (see /home/fdmanana/git/hub/xfstests/results//generic/094.out.bad) --- tests/generic/094.out 2020-06-10 19:29:03.830519425 +0100 +++ /home/fdmanana/git/hub/xfstests/results//generic/094.out.bad 2024-02-28 11:00:00.381071525 +0000 @@ -1,3 +1,9 @@ QA output created by 094 fiemap run with sync fiemap run without sync +ERROR: couldn't find extent at 7 +map is 'HHDDHPPDPHPH' +logical: [ 5.. 6] phys: 301517.. 301518 flags: 0x800 tot: 2 +logical: [ 8.. 8] phys: 301520.. 301520 flags: 0x800 tot: 1 ... (Run 'diff -u /home/fdmanana/git/hub/xfstests/tests/generic/094.out /home/fdmanana/git/hub/xfstests/results//generic/094.out.bad' to see the entire diff) So in order to fix this, while still avoiding deadlocks in the case where the fiemap buffer is memory mapped to the same file, change fiemap to work like the following: 1) Always lock the whole range in the inode's io tree before starting to iterate the inode's subvolume tree searching for file extent items, just like we did before commit b0ad381fa769 ("btrfs: fix deadlock with fiemap and extent locking"); 2) Now instead of writing to the fiemap buffer every time we have an extent to report, write instead to a temporary buffer (1 page), and when that buffer becomes full, stop iterating the file extent items, unlock the range in the io tree, release the search path, submit all the entries kept in that buffer to the fiemap buffer, and then resume the search for file extent items after locking again the remainder of the range in the io tree. The buffer having a size of a page, allows for 146 entries in a system with 4K pages. This is a large enough value to have a good performance by avoiding too many restarts of the search for file extent items. In other words this preserves the huge performance gains made in the last two years to fiemap, while avoiding the deadlocks in case the fiemap buffer is memory mapped to the same file (useless in practice, but possible and exercised by fuzz testing and syzbot). Fixes: b0ad381fa769 ("btrfs: fix deadlock with fiemap and extent locking") Reviewed-by: Josef Bacik <josef(a)toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana(a)suse.com> Signed-off-by: David Sterba <dsterba(a)suse.com> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index e6a2b6eb89e1..fbb05b0f7ebc 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -2453,12 +2453,65 @@ int try_release_extent_mapping(struct page *page, gfp_t mask) return try_release_extent_state(tree, page, mask); } +struct btrfs_fiemap_entry { + u64 offset; + u64 phys; + u64 len; + u32 flags; +}; + /* - * To cache previous fiemap extent + * Indicate the caller of emit_fiemap_extent() that it needs to unlock the file + * range from the inode's io tree, unlock the subvolume tree search path, flush + * the fiemap cache and relock the file range and research the subvolume tree. + * The value here is something negative that can't be confused with a valid + * errno value and different from 1 because that's also a return value from + * fiemap_fill_next_extent() and also it's often used to mean some btree search + * did not find a key, so make it some distinct negative value. + */ +#define BTRFS_FIEMAP_FLUSH_CACHE (-(MAX_ERRNO + 1)) + +/* + * Used to: * - * Will be used for merging fiemap extent + * - Cache the next entry to be emitted to the fiemap buffer, so that we can + * merge extents that are contiguous and can be grouped as a single one; + * + * - Store extents ready to be written to the fiemap buffer in an intermediary + * buffer. This intermediary buffer is to ensure that in case the fiemap + * buffer is memory mapped to the fiemap target file, we don't deadlock + * during btrfs_page_mkwrite(). This is because during fiemap we are locking + * an extent range in order to prevent races with delalloc flushing and + * ordered extent completion, which is needed in order to reliably detect + * delalloc in holes and prealloc extents. And this can lead to a deadlock + * if the fiemap buffer is memory mapped to the file we are running fiemap + * against (a silly, useless in practice scenario, but possible) because + * btrfs_page_mkwrite() will try to lock the same extent range. */ struct fiemap_cache { + /* An array of ready fiemap entries. */ + struct btrfs_fiemap_entry *entries; + /* Number of entries in the entries array. */ + int entries_size; + /* Index of the next entry in the entries array to write to. */ + int entries_pos; + /* + * Once the entries array is full, this indicates what's the offset for + * the next file extent item we must search for in the inode's subvolume + * tree after unlocking the extent range in the inode's io tree and + * releasing the search path. + */ + u64 next_search_offset; + /* + * This matches struct fiemap_extent_info::fi_mapped_extents, we use it + * to count ourselves emitted extents and stop instead of relying on + * fiemap_fill_next_extent() because we buffer ready fiemap entries at + * the @entries array, and we want to stop as soon as we hit the max + * amount of extents to map, not just to save time but also to make the + * logic at extent_fiemap() simpler. + */ + unsigned int extents_mapped; + /* Fields for the cached extent (unsubmitted, not ready, extent). */ u64 offset; u64 phys; u64 len; @@ -2466,6 +2519,28 @@ struct fiemap_cache { bool cached; }; +static int flush_fiemap_cache(struct fiemap_extent_info *fieinfo, + struct fiemap_cache *cache) +{ + for (int i = 0; i < cache->entries_pos; i++) { + struct btrfs_fiemap_entry *entry = &cache->entries[i]; + int ret; + + ret = fiemap_fill_next_extent(fieinfo, entry->offset, + entry->phys, entry->len, + entry->flags); + /* + * Ignore 1 (reached max entries) because we keep track of that + * ourselves in emit_fiemap_extent(). + */ + if (ret < 0) + return ret; + } + cache->entries_pos = 0; + + return 0; +} + /* * Helper to submit fiemap extent. * @@ -2480,8 +2555,8 @@ static int emit_fiemap_extent(struct fiemap_extent_info *fieinfo, struct fiemap_cache *cache, u64 offset, u64 phys, u64 len, u32 flags) { + struct btrfs_fiemap_entry *entry; u64 cache_end; - int ret = 0; /* Set at the end of extent_fiemap(). */ ASSERT((flags & FIEMAP_EXTENT_LAST) == 0); @@ -2494,7 +2569,9 @@ static int emit_fiemap_extent(struct fiemap_extent_info *fieinfo, * find an extent that starts at an offset behind the end offset of the * previous extent we processed. This happens if fiemap is called * without FIEMAP_FLAG_SYNC and there are ordered extents completing - * while we call btrfs_next_leaf() (through fiemap_next_leaf_item()). + * after we had to unlock the file range, release the search path, emit + * the fiemap extents stored in the buffer (cache->entries array) and + * the lock the remainder of the range and re-search the btree. * * For example we are in leaf X processing its last item, which is the * file extent item for file range [512K, 1M[, and after @@ -2607,11 +2684,35 @@ static int emit_fiemap_extent(struct fiemap_extent_info *fieinfo, emit: /* Not mergeable, need to submit cached one */ - ret = fiemap_fill_next_extent(fieinfo, cache->offset, cache->phys, - cache->len, cache->flags); - cache->cached = false; - if (ret) - return ret; + + if (cache->entries_pos == cache->entries_size) { + /* + * We will need to research for the end offset of the last + * stored extent and not from the current offset, because after + * unlocking the range and releasing the path, if there's a hole + * between that end offset and this current offset, a new extent + * may have been inserted due to a new write, so we don't want + * to miss it. + */ + entry = &cache->entries[cache->entries_size - 1]; + cache->next_search_offset = entry->offset + entry->len; + cache->cached = false; + + return BTRFS_FIEMAP_FLUSH_CACHE; + } + + entry = &cache->entries[cache->entries_pos]; + entry->offset = cache->offset; + entry->phys = cache->phys; + entry->len = cache->len; + entry->flags = cache->flags; + cache->entries_pos++; + cache->extents_mapped++; + + if (cache->extents_mapped == fieinfo->fi_extents_max) { + cache->cached = false; + return 1; + } assign: cache->cached = true; cache->offset = offset; @@ -2737,8 +2838,8 @@ static int fiemap_search_slot(struct btrfs_inode *inode, struct btrfs_path *path * neighbour leaf). * We also need the private clone because holding a read lock on an * extent buffer of the subvolume's b+tree will make lockdep unhappy - * when we call fiemap_fill_next_extent(), because that may cause a page - * fault when filling the user space buffer with fiemap data. + * when we check if extents are shared, as backref walking may need to + * lock the same leaf we are processing. */ clone = btrfs_clone_extent_buffer(path->nodes[0]); if (!clone) @@ -2778,34 +2879,16 @@ static int fiemap_process_hole(struct btrfs_inode *inode, * it beyond i_size. */ while (cur_offset < end && cur_offset < i_size) { - struct extent_state *cached_state = NULL; u64 delalloc_start; u64 delalloc_end; u64 prealloc_start; - u64 lockstart; - u64 lockend; u64 prealloc_len = 0; bool delalloc; - lockstart = round_down(cur_offset, inode->root->fs_info->sectorsize); - lockend = round_up(end, inode->root->fs_info->sectorsize); - - /* - * We are only locking for the delalloc range because that's the - * only thing that can change here. With fiemap we have a lock - * on the inode, so no buffered or direct writes can happen. - * - * However mmaps and normal page writeback will cause this to - * change arbitrarily. We have to lock the extent lock here to - * make sure that nobody messes with the tree while we're doing - * btrfs_find_delalloc_in_range. - */ - lock_extent(&inode->io_tree, lockstart, lockend, &cached_state); delalloc = btrfs_find_delalloc_in_range(inode, cur_offset, end, delalloc_cached_state, &delalloc_start, &delalloc_end); - unlock_extent(&inode->io_tree, lockstart, lockend, &cached_state); if (!delalloc) break; @@ -2973,6 +3056,7 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo, u64 start, u64 len) { const u64 ino = btrfs_ino(inode); + struct extent_state *cached_state = NULL; struct extent_state *delalloc_cached_state = NULL; struct btrfs_path *path; struct fiemap_cache cache = { 0 }; @@ -2985,26 +3069,33 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo, bool stopped = false; int ret; + cache.entries_size = PAGE_SIZE / sizeof(struct btrfs_fiemap_entry); + cache.entries = kmalloc_array(cache.entries_size, + sizeof(struct btrfs_fiemap_entry), + GFP_KERNEL); backref_ctx = btrfs_alloc_backref_share_check_ctx(); path = btrfs_alloc_path(); - if (!backref_ctx || !path) { + if (!cache.entries || !backref_ctx || !path) { ret = -ENOMEM; goto out; } +restart: range_start = round_down(start, sectorsize); range_end = round_up(start + len, sectorsize); prev_extent_end = range_start; + lock_extent(&inode->io_tree, range_start, range_end, &cached_state); + ret = fiemap_find_last_extent_offset(inode, path, &last_extent_end); if (ret < 0) - goto out; + goto out_unlock; btrfs_release_path(path); path->reada = READA_FORWARD; ret = fiemap_search_slot(inode, path, range_start); if (ret < 0) { - goto out; + goto out_unlock; } else if (ret > 0) { /* * No file extent item found, but we may have delalloc between @@ -3051,7 +3142,7 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo, backref_ctx, 0, 0, 0, prev_extent_end, hole_end); if (ret < 0) { - goto out; + goto out_unlock; } else if (ret > 0) { /* fiemap_fill_next_extent() told us to stop. */ stopped = true; @@ -3107,7 +3198,7 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo, extent_gen, backref_ctx); if (ret < 0) - goto out; + goto out_unlock; else if (ret > 0) flags |= FIEMAP_EXTENT_SHARED; } @@ -3118,9 +3209,9 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo, } if (ret < 0) { - goto out; + goto out_unlock; } else if (ret > 0) { - /* fiemap_fill_next_extent() told us to stop. */ + /* emit_fiemap_extent() told us to stop. */ stopped = true; break; } @@ -3129,12 +3220,12 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo, next_item: if (fatal_signal_pending(current)) { ret = -EINTR; - goto out; + goto out_unlock; } ret = fiemap_next_leaf_item(inode, path); if (ret < 0) { - goto out; + goto out_unlock; } else if (ret > 0) { /* No more file extent items for this inode. */ break; @@ -3143,22 +3234,12 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo, } check_eof_delalloc: - /* - * Release (and free) the path before emitting any final entries to - * fiemap_fill_next_extent() to keep lockdep happy. This is because - * once we find no more file extent items exist, we may have a - * non-cloned leaf, and fiemap_fill_next_extent() can trigger page - * faults when copying data to the user space buffer. - */ - btrfs_free_path(path); - path = NULL; - if (!stopped && prev_extent_end < range_end) { ret = fiemap_process_hole(inode, fieinfo, &cache, &delalloc_cached_state, backref_ctx, 0, 0, 0, prev_extent_end, range_end - 1); if (ret < 0) - goto out; + goto out_unlock; prev_extent_end = range_end; } @@ -3166,28 +3247,16 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo, const u64 i_size = i_size_read(&inode->vfs_inode); if (prev_extent_end < i_size) { - struct extent_state *cached_state = NULL; u64 delalloc_start; u64 delalloc_end; - u64 lockstart; - u64 lockend; bool delalloc; - lockstart = round_down(prev_extent_end, sectorsize); - lockend = round_up(i_size, sectorsize); - - /* - * See the comment in fiemap_process_hole as to why - * we're doing the locking here. - */ - lock_extent(&inode->io_tree, lockstart, lockend, &cached_state); delalloc = btrfs_find_delalloc_in_range(inode, prev_extent_end, i_size - 1, &delalloc_cached_state, &delalloc_start, &delalloc_end); - unlock_extent(&inode->io_tree, lockstart, lockend, &cached_state); if (!delalloc) cache.flags |= FIEMAP_EXTENT_LAST; } else { @@ -3195,9 +3264,39 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo, } } +out_unlock: + unlock_extent(&inode->io_tree, range_start, range_end, &cached_state); + + if (ret == BTRFS_FIEMAP_FLUSH_CACHE) { + btrfs_release_path(path); + ret = flush_fiemap_cache(fieinfo, &cache); + if (ret) + goto out; + len -= cache.next_search_offset - start; + start = cache.next_search_offset; + goto restart; + } else if (ret < 0) { + goto out; + } + + /* + * Must free the path before emitting to the fiemap buffer because we + * may have a non-cloned leaf and if the fiemap buffer is memory mapped + * to a file, a write into it (through btrfs_page_mkwrite()) may trigger + * waiting for an ordered extent that in order to complete needs to + * modify that leaf, therefore leading to a deadlock. + */ + btrfs_free_path(path); + path = NULL; + + ret = flush_fiemap_cache(fieinfo, &cache); + if (ret) + goto out; + ret = emit_last_fiemap_cache(fieinfo, &cache); out: free_extent_state(delalloc_cached_state); + kfree(cache.entries); btrfs_free_backref_share_ctx(backref_ctx); btrfs_free_path(path); return ret;

1 year, 8 months

1
0
0 0

[PATCH 4.19 v2 0/2] Fix stable-4.19 use-after-free bug

by George Guo

v2: - add branch info the patch belongs to. - add upstream commit id (these patches are equivalent fix ones). George Guo (2): tracing: Remove unnecessary hist_data destroy in destroy_synth_var_refs() tracing: Remove unnecessary var destroy in onmax_destroy() kernel/trace/trace_events_hist.c | 27 ++------------------------- 1 file changed, 2 insertions(+), 25 deletions(-) -- 2.34.1

1 year, 8 months

2
4
0 0

FAILED: patch "[PATCH] crash: use macro to add crashk_res into iomem early for" failed to apply to 6.8-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 6.8-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.8.y git checkout FETCH_HEAD git cherry-pick -x 32fbe5246582af4f611ccccee33fd6e559087252 # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024033005-graded-dangle-3a21@gregkh' --subject-prefix 'PATCH 6.8.y' HEAD^.. Possible dependencies: 32fbe5246582 ("crash: use macro to add crashk_res into iomem early for specific arch") 85fcde402db1 ("kexec: split crashkernel reservation code out from crash_core.c") thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 32fbe5246582af4f611ccccee33fd6e559087252 Mon Sep 17 00:00:00 2001 From: Baoquan He <bhe(a)redhat.com> Date: Mon, 25 Mar 2024 09:50:50 +0800 Subject: [PATCH] crash: use macro to add crashk_res into iomem early for specific arch There are regression reports[1][2] that crashkernel region on x86_64 can't be added into iomem tree sometime. This causes the later failure of kdump loading. This happened after commit 4a693ce65b18 ("kdump: defer the insertion of crashkernel resources") was merged. Even though, these reported issues are proved to be related to other component, they are just exposed after above commmit applied, I still would like to keep crashk_res and crashk_low_res being added into iomem early as before because the early adding has been always there on x86_64 and working very well. For safety of kdump, Let's change it back. Here, add a macro HAVE_ARCH_ADD_CRASH_RES_TO_IOMEM_EARLY to limit that only ARCH defining the macro can have the early adding crashk_res/_low_res into iomem. Then define HAVE_ARCH_ADD_CRASH_RES_TO_IOMEM_EARLY on x86 to enable it. Note: In reserve_crashkernel_low(), there's a remnant of crashk_low_res handling which was mistakenly added back in commit 85fcde402db1 ("kexec: split crashkernel reservation code out from crash_core.c"). [1] [PATCH V2] x86/kexec: do not update E820 kexec table for setup_data https://lore.kernel.org/all/Zfv8iCL6CT2JqLIC@darkstar.users.ipa.redhat.com/… [2] Question about Address Range Validation in Crash Kernel Allocation https://lore.kernel.org/all/4eeac1f733584855965a2ea62fa4da58@huawei.com/T/#u Link: https://lkml.kernel.org/r/ZgDYemRQ2jxjLkq+@MiWiFi-R3L-srv Fixes: 4a693ce65b18 ("kdump: defer the insertion of crashkernel resources") Signed-off-by: Baoquan He <bhe(a)redhat.com> Cc: Dave Young <dyoung(a)redhat.com> Cc: Huacai Chen <chenhuacai(a)loongson.cn> Cc: Ingo Molnar <mingo(a)kernel.org> Cc: Jiri Bohac <jbohac(a)suse.cz> Cc: Li Huafei <lihuafei1(a)huawei.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> diff --git a/arch/x86/include/asm/crash_reserve.h b/arch/x86/include/asm/crash_reserve.h index 152239f95541..7835b2cdff04 100644 --- a/arch/x86/include/asm/crash_reserve.h +++ b/arch/x86/include/asm/crash_reserve.h @@ -39,4 +39,6 @@ static inline unsigned long crash_low_size_default(void) #endif } +#define HAVE_ARCH_ADD_CRASH_RES_TO_IOMEM_EARLY + #endif /* _X86_CRASH_RESERVE_H */ diff --git a/kernel/crash_reserve.c b/kernel/crash_reserve.c index bbb6c3cb00e4..066668799f75 100644 --- a/kernel/crash_reserve.c +++ b/kernel/crash_reserve.c @@ -366,7 +366,9 @@ static int __init reserve_crashkernel_low(unsigned long long low_size) crashk_low_res.start = low_base; crashk_low_res.end = low_base + low_size - 1; +#ifdef HAVE_ARCH_ADD_CRASH_RES_TO_IOMEM_EARLY insert_resource(&iomem_resource, &crashk_low_res); +#endif #endif return 0; } @@ -448,8 +450,12 @@ void __init reserve_crashkernel_generic(char *cmdline, crashk_res.start = crash_base; crashk_res.end = crash_base + crash_size - 1; +#ifdef HAVE_ARCH_ADD_CRASH_RES_TO_IOMEM_EARLY + insert_resource(&iomem_resource, &crashk_res); +#endif } +#ifndef HAVE_ARCH_ADD_CRASH_RES_TO_IOMEM_EARLY static __init int insert_crashkernel_resources(void) { if (crashk_res.start < crashk_res.end) @@ -462,3 +468,4 @@ static __init int insert_crashkernel_resources(void) } early_initcall(insert_crashkernel_resources); #endif +#endif

1 year, 8 months

3
2
0 0

FAILED: patch "[PATCH] scsi: sd: Fix TCG OPAL unlock on system resume" failed to apply to 6.1-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 6.1-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y git checkout FETCH_HEAD git cherry-pick -x 0c76106cb97548810214def8ee22700bbbb90543 # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024040127-defraud-ladle-60f4@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^.. Possible dependencies: 0c76106cb975 ("scsi: sd: Fix TCG OPAL unlock on system resume") 99398d2070ab ("scsi: sd: Do not issue commands to suspended disks on shutdown") 8b4d9469d0b0 ("ata: libata-scsi: Fix delayed scsi_rescan_device() execution") ff48b37802e5 ("scsi: Do not attempt to rescan suspended devices") aa3998dbeb3a ("ata: libata-scsi: Disable scsi device manage_system_start_stop") 3cc2ffe5c16d ("scsi: sd: Differentiate system and runtime start/stop management") 2a5a4326e583 ("Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi") thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 0c76106cb97548810214def8ee22700bbbb90543 Mon Sep 17 00:00:00 2001 From: Damien Le Moal <dlemoal(a)kernel.org> Date: Tue, 19 Mar 2024 16:12:09 +0900 Subject: [PATCH] scsi: sd: Fix TCG OPAL unlock on system resume Commit 3cc2ffe5c16d ("scsi: sd: Differentiate system and runtime start/stop management") introduced the manage_system_start_stop scsi_device flag to allow libata to indicate to the SCSI disk driver that nothing should be done when resuming a disk on system resume. This change turned the execution of sd_resume() into a no-op for ATA devices on system resume. While this solved deadlock issues during device resume, this change also wrongly removed the execution of opal_unlock_from_suspend(). As a result, devices with TCG OPAL locking enabled remain locked and inaccessible after a system resume from sleep. To fix this issue, introduce the SCSI driver resume method and implement it with the sd_resume() function calling opal_unlock_from_suspend(). The former sd_resume() function is renamed to sd_resume_common() and modified to call the new sd_resume() function. For non-ATA devices, this result in no functional changes. In order for libata to explicitly execute sd_resume() when a device is resumed during system restart, the function scsi_resume_device() is introduced. libata calls this function from the revalidation work executed on devie resume, a state that is indicated with the new device flag ATA_DFLAG_RESUMING. Doing so, locked TCG OPAL enabled devices are unlocked on resume, allowing normal operation. Fixes: 3cc2ffe5c16d ("scsi: sd: Differentiate system and runtime start/stop management") Link: https://bugzilla.kernel.org/show_bug.cgi?id=218538 Cc: stable(a)vger.kernel.org Signed-off-by: Damien Le Moal <dlemoal(a)kernel.org> Link: https://lore.kernel.org/r/20240319071209.1179257-1-dlemoal@kernel.org Signed-off-by: Martin K. Petersen <martin.petersen(a)oracle.com> diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c index b0d6e69c4a5b..214b935c2ced 100644 --- a/drivers/ata/libata-eh.c +++ b/drivers/ata/libata-eh.c @@ -712,8 +712,10 @@ void ata_scsi_port_error_handler(struct Scsi_Host *host, struct ata_port *ap) ehc->saved_ncq_enabled |= 1 << devno; /* If we are resuming, wake up the device */ - if (ap->pflags & ATA_PFLAG_RESUMING) + if (ap->pflags & ATA_PFLAG_RESUMING) { + dev->flags |= ATA_DFLAG_RESUMING; ehc->i.dev_action[devno] |= ATA_EH_SET_ACTIVE; + } } } @@ -3169,6 +3171,7 @@ static int ata_eh_revalidate_and_attach(struct ata_link *link, return 0; err: + dev->flags &= ~ATA_DFLAG_RESUMING; *r_failed_dev = dev; return rc; } diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c index 0a0f483124c3..2f4c58837641 100644 --- a/drivers/ata/libata-scsi.c +++ b/drivers/ata/libata-scsi.c @@ -4730,6 +4730,7 @@ void ata_scsi_dev_rescan(struct work_struct *work) struct ata_link *link; struct ata_device *dev; unsigned long flags; + bool do_resume; int ret = 0; mutex_lock(&ap->scsi_scan_mutex); @@ -4751,7 +4752,15 @@ void ata_scsi_dev_rescan(struct work_struct *work) if (scsi_device_get(sdev)) continue; + do_resume = dev->flags & ATA_DFLAG_RESUMING; + spin_unlock_irqrestore(ap->lock, flags); + if (do_resume) { + ret = scsi_resume_device(sdev); + if (ret == -EWOULDBLOCK) + goto unlock; + dev->flags &= ~ATA_DFLAG_RESUMING; + } ret = scsi_rescan_device(sdev); scsi_device_put(sdev); spin_lock_irqsave(ap->lock, flags); diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c index 8d06475de17a..ffd7e7e72933 100644 --- a/drivers/scsi/scsi_scan.c +++ b/drivers/scsi/scsi_scan.c @@ -1642,6 +1642,40 @@ int scsi_add_device(struct Scsi_Host *host, uint channel, } EXPORT_SYMBOL(scsi_add_device); +int scsi_resume_device(struct scsi_device *sdev) +{ + struct device *dev = &sdev->sdev_gendev; + int ret = 0; + + device_lock(dev); + + /* + * Bail out if the device or its queue are not running. Otherwise, + * the rescan may block waiting for commands to be executed, with us + * holding the device lock. This can result in a potential deadlock + * in the power management core code when system resume is on-going. + */ + if (sdev->sdev_state != SDEV_RUNNING || + blk_queue_pm_only(sdev->request_queue)) { + ret = -EWOULDBLOCK; + goto unlock; + } + + if (dev->driver && try_module_get(dev->driver->owner)) { + struct scsi_driver *drv = to_scsi_driver(dev->driver); + + if (drv->resume) + ret = drv->resume(dev); + module_put(dev->driver->owner); + } + +unlock: + device_unlock(dev); + + return ret; +} +EXPORT_SYMBOL(scsi_resume_device); + int scsi_rescan_device(struct scsi_device *sdev) { struct device *dev = &sdev->sdev_gendev; diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c index ccff8f2e2e75..3cf898670290 100644 --- a/drivers/scsi/sd.c +++ b/drivers/scsi/sd.c @@ -4108,7 +4108,21 @@ static int sd_suspend_runtime(struct device *dev) return sd_suspend_common(dev, true); } -static int sd_resume(struct device *dev, bool runtime) +static int sd_resume(struct device *dev) +{ + struct scsi_disk *sdkp = dev_get_drvdata(dev); + + sd_printk(KERN_NOTICE, sdkp, "Starting disk\n"); + + if (opal_unlock_from_suspend(sdkp->opal_dev)) { + sd_printk(KERN_NOTICE, sdkp, "OPAL unlock failed\n"); + return -EIO; + } + + return 0; +} + +static int sd_resume_common(struct device *dev, bool runtime) { struct scsi_disk *sdkp = dev_get_drvdata(dev); int ret; @@ -4124,7 +4138,7 @@ static int sd_resume(struct device *dev, bool runtime) sd_printk(KERN_NOTICE, sdkp, "Starting disk\n"); ret = sd_start_stop_device(sdkp, 1); if (!ret) { - opal_unlock_from_suspend(sdkp->opal_dev); + sd_resume(dev); sdkp->suspended = false; } @@ -4143,7 +4157,7 @@ static int sd_resume_system(struct device *dev) return 0; } - return sd_resume(dev, false); + return sd_resume_common(dev, false); } static int sd_resume_runtime(struct device *dev) @@ -4170,7 +4184,7 @@ static int sd_resume_runtime(struct device *dev) "Failed to clear sense data\n"); } - return sd_resume(dev, true); + return sd_resume_common(dev, true); } static const struct dev_pm_ops sd_pm_ops = { @@ -4193,6 +4207,7 @@ static struct scsi_driver sd_template = { .pm = &sd_pm_ops, }, .rescan = sd_rescan, + .resume = sd_resume, .init_command = sd_init_command, .uninit_command = sd_uninit_command, .done = sd_done, diff --git a/include/linux/libata.h b/include/linux/libata.h index 26d68115afb8..324d792e7c78 100644 --- a/include/linux/libata.h +++ b/include/linux/libata.h @@ -107,6 +107,7 @@ enum { ATA_DFLAG_NCQ_PRIO_ENABLED = (1 << 20), /* Priority cmds sent to dev */ ATA_DFLAG_CDL_ENABLED = (1 << 21), /* cmd duration limits is enabled */ + ATA_DFLAG_RESUMING = (1 << 22), /* Device is resuming */ ATA_DFLAG_DETACH = (1 << 24), ATA_DFLAG_DETACHED = (1 << 25), ATA_DFLAG_DA = (1 << 26), /* device supports Device Attention */ diff --git a/include/scsi/scsi_driver.h b/include/scsi/scsi_driver.h index 4ce1988b2ba0..f40915d2ecee 100644 --- a/include/scsi/scsi_driver.h +++ b/include/scsi/scsi_driver.h @@ -12,6 +12,7 @@ struct request; struct scsi_driver { struct device_driver gendrv; + int (*resume)(struct device *); void (*rescan)(struct device *); blk_status_t (*init_command)(struct scsi_cmnd *); void (*uninit_command)(struct scsi_cmnd *); diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h index b259d42a1e1a..129001f600fc 100644 --- a/include/scsi/scsi_host.h +++ b/include/scsi/scsi_host.h @@ -767,6 +767,7 @@ scsi_template_proc_dir(const struct scsi_host_template *sht); #define scsi_template_proc_dir(sht) NULL #endif extern void scsi_scan_host(struct Scsi_Host *); +extern int scsi_resume_device(struct scsi_device *sdev); extern int scsi_rescan_device(struct scsi_device *sdev); extern void scsi_remove_host(struct Scsi_Host *); extern struct Scsi_Host *scsi_host_get(struct Scsi_Host *);

1 year, 8 months

3
2
0 0

[PATCH] scsi: sg: Avoid race in error handling & drop bogus warn

by Alexander Wetzel

commit 27f58c04a8f4 ("scsi: sg: Avoid sg device teardown race") introduced an incorrect WARN_ON_ONCE() and missed a sequence where sg_device_destroy() after scsi_device_put() when handling errors. sg_device_destroy() is accessing the parent scsi_device request_queue which will already be set to NULL when the preceding call to scsi_device_put() removed the last reference to the parent scsi_device. Drop the incorrect WARN_ON_ONCE() - allowing more than one concurrent access to the sg device - and make sure sg_device_destroy() is not used after scsi_device_put() in the error handling. Link: https://lore.kernel.org/all/5375B275-D137-4D5F-BE25-6AF8ACAE41EF@linux.ibm.… Fixes: 27f58c04a8f4 ("scsi: sg: Avoid sg device teardown race") Cc: stable(a)vger.kernel.org Signed-off-by: Alexander Wetzel <Alexander(a)wetzel-home.de> --- The WARN_ON_ONCE() was kind of stupid to add: We get add reference for each sg_open(). So opening a second session and then closing either one will trigger the warning... Nothing to warn about here. Alexander --- drivers/scsi/sg.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c index 386981c6976a..833c9277419b 100644 --- a/drivers/scsi/sg.c +++ b/drivers/scsi/sg.c @@ -372,8 +372,9 @@ sg_open(struct inode *inode, struct file *filp) error_out: scsi_autopm_put_device(sdp->device); sdp_put: + kref_put(&sdp->d_ref, sg_device_destroy); scsi_device_put(sdp->device); - goto sg_put; + return retval; } /* Release resources associated with a successful sg_open() @@ -2233,7 +2234,6 @@ sg_remove_sfp_usercontext(struct work_struct *work) "sg_remove_sfp: sfp=0x%p\n", sfp)); kfree(sfp); - WARN_ON_ONCE(kref_read(&sdp->d_ref) != 1); kref_put(&sdp->d_ref, sg_device_destroy); scsi_device_put(device); module_put(THIS_MODULE); -- 2.44.0

1 year, 8 months

1
0
0 0

FAILED: patch "[PATCH] USB: core: Fix deadlock in port "disable" sysfs attribute" failed to apply to 4.19-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.19-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y git checkout FETCH_HEAD git cherry-pick -x f4d1960764d8a70318b02f15203a1be2b2554ca1 # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024040119-ranked-doormat-088b@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^.. Possible dependencies: f4d1960764d8 ("USB: core: Fix deadlock in port "disable" sysfs attribute") f061f43d7418 ("usb: hub: port: add sysfs entry to switch port power") 8c67d06f3fd9 ("usb: Link the ports to the connectors they are attached to") b8f1ba99cea5 ("usb: hub: make wait_for_connected() take an int instead of a pointer to int") f59f93cd1d72 ("usb: hub: avoid warm port reset during USB3 disconnect") 7142452387c7 ("USB: Verify the port status when timeout happens during port suspend") 975f94c7d6c3 ("usb: core: hub: fix race condition about TRSMRCY of resume") 355c74e55e99 ("usb: export firmware port location in sysfs") thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From f4d1960764d8a70318b02f15203a1be2b2554ca1 Mon Sep 17 00:00:00 2001 From: Alan Stern <stern(a)rowland.harvard.edu> Date: Fri, 15 Mar 2024 13:06:33 -0400 Subject: [PATCH] USB: core: Fix deadlock in port "disable" sysfs attribute The show and store callback routines for the "disable" sysfs attribute file in port.c acquire the device lock for the port's parent hub device. This can cause problems if another process has locked the hub to remove it or change its configuration: Removing the hub or changing its configuration requires the hub interface to be removed, which requires the port device to be removed, and device_del() waits until all outstanding sysfs attribute callbacks for the ports have returned. The lock can't be released until then. But the disable_show() or disable_store() routine can't return until after it has acquired the lock. The resulting deadlock can be avoided by calling sysfs_break_active_protection(). This will cause the sysfs core not to wait for the attribute's callback routine to return, allowing the removal to proceed. The disadvantage is that after making this call, there is no guarantee that the hub structure won't be deallocated at any moment. To prevent this, we have to acquire a reference to it first by calling hub_get(). Signed-off-by: Alan Stern <stern(a)rowland.harvard.edu> Cc: stable <stable(a)kernel.org> Link: https://lore.kernel.org/r/f7a8c135-a495-4ce6-bd49-405a45e7ea9a@rowland.harv… Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> diff --git a/drivers/usb/core/port.c b/drivers/usb/core/port.c index 5b5e613a11e5..686c01af03e6 100644 --- a/drivers/usb/core/port.c +++ b/drivers/usb/core/port.c @@ -56,11 +56,22 @@ static ssize_t disable_show(struct device *dev, u16 portstatus, unused; bool disabled; int rc; + struct kernfs_node *kn; + hub_get(hub); rc = usb_autopm_get_interface(intf); if (rc < 0) - return rc; + goto out_hub_get; + /* + * Prevent deadlock if another process is concurrently + * trying to unregister hdev. + */ + kn = sysfs_break_active_protection(&dev->kobj, &attr->attr); + if (!kn) { + rc = -ENODEV; + goto out_autopm; + } usb_lock_device(hdev); if (hub->disconnected) { rc = -ENODEV; @@ -70,9 +81,13 @@ static ssize_t disable_show(struct device *dev, usb_hub_port_status(hub, port1, &portstatus, &unused); disabled = !usb_port_is_power_on(hub, portstatus); -out_hdev_lock: + out_hdev_lock: usb_unlock_device(hdev); + sysfs_unbreak_active_protection(kn); + out_autopm: usb_autopm_put_interface(intf); + out_hub_get: + hub_put(hub); if (rc) return rc; @@ -90,15 +105,26 @@ static ssize_t disable_store(struct device *dev, struct device_attribute *attr, int port1 = port_dev->portnum; bool disabled; int rc; + struct kernfs_node *kn; rc = kstrtobool(buf, &disabled); if (rc) return rc; + hub_get(hub); rc = usb_autopm_get_interface(intf); if (rc < 0) - return rc; + goto out_hub_get; + /* + * Prevent deadlock if another process is concurrently + * trying to unregister hdev. + */ + kn = sysfs_break_active_protection(&dev->kobj, &attr->attr); + if (!kn) { + rc = -ENODEV; + goto out_autopm; + } usb_lock_device(hdev); if (hub->disconnected) { rc = -ENODEV; @@ -119,9 +145,13 @@ static ssize_t disable_store(struct device *dev, struct device_attribute *attr, if (!rc) rc = count; -out_hdev_lock: + out_hdev_lock: usb_unlock_device(hdev); + sysfs_unbreak_active_protection(kn); + out_autopm: usb_autopm_put_interface(intf); + out_hub_get: + hub_put(hub); return rc; }

1 year, 8 months

1
0
0 0

FAILED: patch "[PATCH] USB: core: Fix deadlock in port "disable" sysfs attribute" failed to apply to 5.4-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 5.4-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y git checkout FETCH_HEAD git cherry-pick -x f4d1960764d8a70318b02f15203a1be2b2554ca1 # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024040117-shallow-faceless-7f3d@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^.. Possible dependencies: f4d1960764d8 ("USB: core: Fix deadlock in port "disable" sysfs attribute") f061f43d7418 ("usb: hub: port: add sysfs entry to switch port power") 8c67d06f3fd9 ("usb: Link the ports to the connectors they are attached to") b8f1ba99cea5 ("usb: hub: make wait_for_connected() take an int instead of a pointer to int") f59f93cd1d72 ("usb: hub: avoid warm port reset during USB3 disconnect") 7142452387c7 ("USB: Verify the port status when timeout happens during port suspend") 975f94c7d6c3 ("usb: core: hub: fix race condition about TRSMRCY of resume") thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From f4d1960764d8a70318b02f15203a1be2b2554ca1 Mon Sep 17 00:00:00 2001 From: Alan Stern <stern(a)rowland.harvard.edu> Date: Fri, 15 Mar 2024 13:06:33 -0400 Subject: [PATCH] USB: core: Fix deadlock in port "disable" sysfs attribute The show and store callback routines for the "disable" sysfs attribute file in port.c acquire the device lock for the port's parent hub device. This can cause problems if another process has locked the hub to remove it or change its configuration: Removing the hub or changing its configuration requires the hub interface to be removed, which requires the port device to be removed, and device_del() waits until all outstanding sysfs attribute callbacks for the ports have returned. The lock can't be released until then. But the disable_show() or disable_store() routine can't return until after it has acquired the lock. The resulting deadlock can be avoided by calling sysfs_break_active_protection(). This will cause the sysfs core not to wait for the attribute's callback routine to return, allowing the removal to proceed. The disadvantage is that after making this call, there is no guarantee that the hub structure won't be deallocated at any moment. To prevent this, we have to acquire a reference to it first by calling hub_get(). Signed-off-by: Alan Stern <stern(a)rowland.harvard.edu> Cc: stable <stable(a)kernel.org> Link: https://lore.kernel.org/r/f7a8c135-a495-4ce6-bd49-405a45e7ea9a@rowland.harv… Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> diff --git a/drivers/usb/core/port.c b/drivers/usb/core/port.c index 5b5e613a11e5..686c01af03e6 100644 --- a/drivers/usb/core/port.c +++ b/drivers/usb/core/port.c @@ -56,11 +56,22 @@ static ssize_t disable_show(struct device *dev, u16 portstatus, unused; bool disabled; int rc; + struct kernfs_node *kn; + hub_get(hub); rc = usb_autopm_get_interface(intf); if (rc < 0) - return rc; + goto out_hub_get; + /* + * Prevent deadlock if another process is concurrently + * trying to unregister hdev. + */ + kn = sysfs_break_active_protection(&dev->kobj, &attr->attr); + if (!kn) { + rc = -ENODEV; + goto out_autopm; + } usb_lock_device(hdev); if (hub->disconnected) { rc = -ENODEV; @@ -70,9 +81,13 @@ static ssize_t disable_show(struct device *dev, usb_hub_port_status(hub, port1, &portstatus, &unused); disabled = !usb_port_is_power_on(hub, portstatus); -out_hdev_lock: + out_hdev_lock: usb_unlock_device(hdev); + sysfs_unbreak_active_protection(kn); + out_autopm: usb_autopm_put_interface(intf); + out_hub_get: + hub_put(hub); if (rc) return rc; @@ -90,15 +105,26 @@ static ssize_t disable_store(struct device *dev, struct device_attribute *attr, int port1 = port_dev->portnum; bool disabled; int rc; + struct kernfs_node *kn; rc = kstrtobool(buf, &disabled); if (rc) return rc; + hub_get(hub); rc = usb_autopm_get_interface(intf); if (rc < 0) - return rc; + goto out_hub_get; + /* + * Prevent deadlock if another process is concurrently + * trying to unregister hdev. + */ + kn = sysfs_break_active_protection(&dev->kobj, &attr->attr); + if (!kn) { + rc = -ENODEV; + goto out_autopm; + } usb_lock_device(hdev); if (hub->disconnected) { rc = -ENODEV; @@ -119,9 +145,13 @@ static ssize_t disable_store(struct device *dev, struct device_attribute *attr, if (!rc) rc = count; -out_hdev_lock: + out_hdev_lock: usb_unlock_device(hdev); + sysfs_unbreak_active_protection(kn); + out_autopm: usb_autopm_put_interface(intf); + out_hub_get: + hub_put(hub); return rc; }

1 year, 8 months

1
0
0 0

FAILED: patch "[PATCH] USB: core: Fix deadlock in port "disable" sysfs attribute" failed to apply to 5.10-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 5.10-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y git checkout FETCH_HEAD git cherry-pick -x f4d1960764d8a70318b02f15203a1be2b2554ca1 # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024040116-mortician-grudging-9be5@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^.. Possible dependencies: f4d1960764d8 ("USB: core: Fix deadlock in port "disable" sysfs attribute") f061f43d7418 ("usb: hub: port: add sysfs entry to switch port power") 8c67d06f3fd9 ("usb: Link the ports to the connectors they are attached to") b8f1ba99cea5 ("usb: hub: make wait_for_connected() take an int instead of a pointer to int") f59f93cd1d72 ("usb: hub: avoid warm port reset during USB3 disconnect") 7142452387c7 ("USB: Verify the port status when timeout happens during port suspend") 975f94c7d6c3 ("usb: core: hub: fix race condition about TRSMRCY of resume") thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From f4d1960764d8a70318b02f15203a1be2b2554ca1 Mon Sep 17 00:00:00 2001 From: Alan Stern <stern(a)rowland.harvard.edu> Date: Fri, 15 Mar 2024 13:06:33 -0400 Subject: [PATCH] USB: core: Fix deadlock in port "disable" sysfs attribute The show and store callback routines for the "disable" sysfs attribute file in port.c acquire the device lock for the port's parent hub device. This can cause problems if another process has locked the hub to remove it or change its configuration: Removing the hub or changing its configuration requires the hub interface to be removed, which requires the port device to be removed, and device_del() waits until all outstanding sysfs attribute callbacks for the ports have returned. The lock can't be released until then. But the disable_show() or disable_store() routine can't return until after it has acquired the lock. The resulting deadlock can be avoided by calling sysfs_break_active_protection(). This will cause the sysfs core not to wait for the attribute's callback routine to return, allowing the removal to proceed. The disadvantage is that after making this call, there is no guarantee that the hub structure won't be deallocated at any moment. To prevent this, we have to acquire a reference to it first by calling hub_get(). Signed-off-by: Alan Stern <stern(a)rowland.harvard.edu> Cc: stable <stable(a)kernel.org> Link: https://lore.kernel.org/r/f7a8c135-a495-4ce6-bd49-405a45e7ea9a@rowland.harv… Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> diff --git a/drivers/usb/core/port.c b/drivers/usb/core/port.c index 5b5e613a11e5..686c01af03e6 100644 --- a/drivers/usb/core/port.c +++ b/drivers/usb/core/port.c @@ -56,11 +56,22 @@ static ssize_t disable_show(struct device *dev, u16 portstatus, unused; bool disabled; int rc; + struct kernfs_node *kn; + hub_get(hub); rc = usb_autopm_get_interface(intf); if (rc < 0) - return rc; + goto out_hub_get; + /* + * Prevent deadlock if another process is concurrently + * trying to unregister hdev. + */ + kn = sysfs_break_active_protection(&dev->kobj, &attr->attr); + if (!kn) { + rc = -ENODEV; + goto out_autopm; + } usb_lock_device(hdev); if (hub->disconnected) { rc = -ENODEV; @@ -70,9 +81,13 @@ static ssize_t disable_show(struct device *dev, usb_hub_port_status(hub, port1, &portstatus, &unused); disabled = !usb_port_is_power_on(hub, portstatus); -out_hdev_lock: + out_hdev_lock: usb_unlock_device(hdev); + sysfs_unbreak_active_protection(kn); + out_autopm: usb_autopm_put_interface(intf); + out_hub_get: + hub_put(hub); if (rc) return rc; @@ -90,15 +105,26 @@ static ssize_t disable_store(struct device *dev, struct device_attribute *attr, int port1 = port_dev->portnum; bool disabled; int rc; + struct kernfs_node *kn; rc = kstrtobool(buf, &disabled); if (rc) return rc; + hub_get(hub); rc = usb_autopm_get_interface(intf); if (rc < 0) - return rc; + goto out_hub_get; + /* + * Prevent deadlock if another process is concurrently + * trying to unregister hdev. + */ + kn = sysfs_break_active_protection(&dev->kobj, &attr->attr); + if (!kn) { + rc = -ENODEV; + goto out_autopm; + } usb_lock_device(hdev); if (hub->disconnected) { rc = -ENODEV; @@ -119,9 +145,13 @@ static ssize_t disable_store(struct device *dev, struct device_attribute *attr, if (!rc) rc = count; -out_hdev_lock: + out_hdev_lock: usb_unlock_device(hdev); + sysfs_unbreak_active_protection(kn); + out_autopm: usb_autopm_put_interface(intf); + out_hub_get: + hub_put(hub); return rc; }

1 year, 8 months

1
0
0 0

FAILED: patch "[PATCH] USB: core: Fix deadlock in port "disable" sysfs attribute" failed to apply to 5.15-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 5.15-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y git checkout FETCH_HEAD git cherry-pick -x f4d1960764d8a70318b02f15203a1be2b2554ca1 # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024040114-outer-channel-e465@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^.. Possible dependencies: f4d1960764d8 ("USB: core: Fix deadlock in port "disable" sysfs attribute") f061f43d7418 ("usb: hub: port: add sysfs entry to switch port power") 8c67d06f3fd9 ("usb: Link the ports to the connectors they are attached to") b8f1ba99cea5 ("usb: hub: make wait_for_connected() take an int instead of a pointer to int") f59f93cd1d72 ("usb: hub: avoid warm port reset during USB3 disconnect") thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From f4d1960764d8a70318b02f15203a1be2b2554ca1 Mon Sep 17 00:00:00 2001 From: Alan Stern <stern(a)rowland.harvard.edu> Date: Fri, 15 Mar 2024 13:06:33 -0400 Subject: [PATCH] USB: core: Fix deadlock in port "disable" sysfs attribute The show and store callback routines for the "disable" sysfs attribute file in port.c acquire the device lock for the port's parent hub device. This can cause problems if another process has locked the hub to remove it or change its configuration: Removing the hub or changing its configuration requires the hub interface to be removed, which requires the port device to be removed, and device_del() waits until all outstanding sysfs attribute callbacks for the ports have returned. The lock can't be released until then. But the disable_show() or disable_store() routine can't return until after it has acquired the lock. The resulting deadlock can be avoided by calling sysfs_break_active_protection(). This will cause the sysfs core not to wait for the attribute's callback routine to return, allowing the removal to proceed. The disadvantage is that after making this call, there is no guarantee that the hub structure won't be deallocated at any moment. To prevent this, we have to acquire a reference to it first by calling hub_get(). Signed-off-by: Alan Stern <stern(a)rowland.harvard.edu> Cc: stable <stable(a)kernel.org> Link: https://lore.kernel.org/r/f7a8c135-a495-4ce6-bd49-405a45e7ea9a@rowland.harv… Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> diff --git a/drivers/usb/core/port.c b/drivers/usb/core/port.c index 5b5e613a11e5..686c01af03e6 100644 --- a/drivers/usb/core/port.c +++ b/drivers/usb/core/port.c @@ -56,11 +56,22 @@ static ssize_t disable_show(struct device *dev, u16 portstatus, unused; bool disabled; int rc; + struct kernfs_node *kn; + hub_get(hub); rc = usb_autopm_get_interface(intf); if (rc < 0) - return rc; + goto out_hub_get; + /* + * Prevent deadlock if another process is concurrently + * trying to unregister hdev. + */ + kn = sysfs_break_active_protection(&dev->kobj, &attr->attr); + if (!kn) { + rc = -ENODEV; + goto out_autopm; + } usb_lock_device(hdev); if (hub->disconnected) { rc = -ENODEV; @@ -70,9 +81,13 @@ static ssize_t disable_show(struct device *dev, usb_hub_port_status(hub, port1, &portstatus, &unused); disabled = !usb_port_is_power_on(hub, portstatus); -out_hdev_lock: + out_hdev_lock: usb_unlock_device(hdev); + sysfs_unbreak_active_protection(kn); + out_autopm: usb_autopm_put_interface(intf); + out_hub_get: + hub_put(hub); if (rc) return rc; @@ -90,15 +105,26 @@ static ssize_t disable_store(struct device *dev, struct device_attribute *attr, int port1 = port_dev->portnum; bool disabled; int rc; + struct kernfs_node *kn; rc = kstrtobool(buf, &disabled); if (rc) return rc; + hub_get(hub); rc = usb_autopm_get_interface(intf); if (rc < 0) - return rc; + goto out_hub_get; + /* + * Prevent deadlock if another process is concurrently + * trying to unregister hdev. + */ + kn = sysfs_break_active_protection(&dev->kobj, &attr->attr); + if (!kn) { + rc = -ENODEV; + goto out_autopm; + } usb_lock_device(hdev); if (hub->disconnected) { rc = -ENODEV; @@ -119,9 +145,13 @@ static ssize_t disable_store(struct device *dev, struct device_attribute *attr, if (!rc) rc = count; -out_hdev_lock: + out_hdev_lock: usb_unlock_device(hdev); + sysfs_unbreak_active_protection(kn); + out_autopm: usb_autopm_put_interface(intf); + out_hub_get: + hub_put(hub); return rc; }

1 year, 8 months

1
0
0 0

FAILED: patch "[PATCH] USB: core: Add hub_get() and hub_put() routines" failed to apply to 4.19-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.19-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y git checkout FETCH_HEAD git cherry-pick -x ee113b860aa169e9a4d2c167c95d0f1961c6e1b8 # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024040159-scurvy-gestation-ae70@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^.. Possible dependencies: ee113b860aa1 ("USB: core: Add hub_get() and hub_put() routines") 3a6bf4a08142 ("usb: core: hub: Create platform devices for onboard hubs in hub_probe()") 1208f9e1d758 ("USB: hub: Fix the broken detection of USB3 device in SMSC hub") 95d23dc27bde ("usb, kcov: collect coverage from hub_event") 8eb58994dd96 ("usb: hub: add retry routine after intr URB submit error") thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From ee113b860aa169e9a4d2c167c95d0f1961c6e1b8 Mon Sep 17 00:00:00 2001 From: Alan Stern <stern(a)rowland.harvard.edu> Date: Fri, 15 Mar 2024 13:04:50 -0400 Subject: [PATCH] USB: core: Add hub_get() and hub_put() routines Create hub_get() and hub_put() routines to encapsulate the kref_get() and kref_put() calls in hub.c. The new routines will be used by the next patch in this series. Signed-off-by: Alan Stern <stern(a)rowland.harvard.edu> Link: https://lore.kernel.org/r/604da420-ae8a-4a9e-91a4-2d511ff404fb@rowland.harv… Cc: stable <stable(a)kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c index 3ee8455585b6..9446660e231b 100644 --- a/drivers/usb/core/hub.c +++ b/drivers/usb/core/hub.c @@ -130,7 +130,6 @@ EXPORT_SYMBOL_GPL(ehci_cf_port_reset_rwsem); #define HUB_DEBOUNCE_STEP 25 #define HUB_DEBOUNCE_STABLE 100 -static void hub_release(struct kref *kref); static int usb_reset_and_verify_device(struct usb_device *udev); static int hub_port_disable(struct usb_hub *hub, int port1, int set_state); static bool hub_port_warm_reset_required(struct usb_hub *hub, int port1, @@ -720,14 +719,14 @@ static void kick_hub_wq(struct usb_hub *hub) */ intf = to_usb_interface(hub->intfdev); usb_autopm_get_interface_no_resume(intf); - kref_get(&hub->kref); + hub_get(hub); if (queue_work(hub_wq, &hub->events)) return; /* the work has already been scheduled */ usb_autopm_put_interface_async(intf); - kref_put(&hub->kref, hub_release); + hub_put(hub); } void usb_kick_hub_wq(struct usb_device *hdev) @@ -1095,7 +1094,7 @@ static void hub_activate(struct usb_hub *hub, enum hub_activation_type type) goto init2; goto init3; } - kref_get(&hub->kref); + hub_get(hub); /* The superspeed hub except for root hub has to use Hub Depth * value as an offset into the route string to locate the bits @@ -1343,7 +1342,7 @@ static void hub_activate(struct usb_hub *hub, enum hub_activation_type type) device_unlock(&hdev->dev); } - kref_put(&hub->kref, hub_release); + hub_put(hub); } /* Implement the continuations for the delays above */ @@ -1759,6 +1758,16 @@ static void hub_release(struct kref *kref) kfree(hub); } +void hub_get(struct usb_hub *hub) +{ + kref_get(&hub->kref); +} + +void hub_put(struct usb_hub *hub) +{ + kref_put(&hub->kref, hub_release); +} + static unsigned highspeed_hubs; static void hub_disconnect(struct usb_interface *intf) @@ -1807,7 +1816,7 @@ static void hub_disconnect(struct usb_interface *intf) onboard_hub_destroy_pdevs(&hub->onboard_hub_devs); - kref_put(&hub->kref, hub_release); + hub_put(hub); } static bool hub_descriptor_is_sane(struct usb_host_interface *desc) @@ -5934,7 +5943,7 @@ static void hub_event(struct work_struct *work) /* Balance the stuff in kick_hub_wq() and allow autosuspend */ usb_autopm_put_interface(intf); - kref_put(&hub->kref, hub_release); + hub_put(hub); kcov_remote_stop(); } diff --git a/drivers/usb/core/hub.h b/drivers/usb/core/hub.h index 43ce21c96a51..183b69dc2955 100644 --- a/drivers/usb/core/hub.h +++ b/drivers/usb/core/hub.h @@ -129,6 +129,8 @@ extern void usb_hub_remove_port_device(struct usb_hub *hub, extern int usb_hub_set_port_power(struct usb_device *hdev, struct usb_hub *hub, int port1, bool set); extern struct usb_hub *usb_hub_to_struct_hub(struct usb_device *hdev); +extern void hub_get(struct usb_hub *hub); +extern void hub_put(struct usb_hub *hub); extern int hub_port_debounce(struct usb_hub *hub, int port1, bool must_be_connected); extern int usb_clear_port_feature(struct usb_device *hdev,

1 year, 8 months

1
0
0 0

FAILED: patch "[PATCH] USB: core: Add hub_get() and hub_put() routines" failed to apply to 5.4-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 5.4-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y git checkout FETCH_HEAD git cherry-pick -x ee113b860aa169e9a4d2c167c95d0f1961c6e1b8 # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024040158-stump-bucked-e91c@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^.. Possible dependencies: ee113b860aa1 ("USB: core: Add hub_get() and hub_put() routines") 3a6bf4a08142 ("usb: core: hub: Create platform devices for onboard hubs in hub_probe()") 1208f9e1d758 ("USB: hub: Fix the broken detection of USB3 device in SMSC hub") 95d23dc27bde ("usb, kcov: collect coverage from hub_event") thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From ee113b860aa169e9a4d2c167c95d0f1961c6e1b8 Mon Sep 17 00:00:00 2001 From: Alan Stern <stern(a)rowland.harvard.edu> Date: Fri, 15 Mar 2024 13:04:50 -0400 Subject: [PATCH] USB: core: Add hub_get() and hub_put() routines Create hub_get() and hub_put() routines to encapsulate the kref_get() and kref_put() calls in hub.c. The new routines will be used by the next patch in this series. Signed-off-by: Alan Stern <stern(a)rowland.harvard.edu> Link: https://lore.kernel.org/r/604da420-ae8a-4a9e-91a4-2d511ff404fb@rowland.harv… Cc: stable <stable(a)kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c index 3ee8455585b6..9446660e231b 100644 --- a/drivers/usb/core/hub.c +++ b/drivers/usb/core/hub.c @@ -130,7 +130,6 @@ EXPORT_SYMBOL_GPL(ehci_cf_port_reset_rwsem); #define HUB_DEBOUNCE_STEP 25 #define HUB_DEBOUNCE_STABLE 100 -static void hub_release(struct kref *kref); static int usb_reset_and_verify_device(struct usb_device *udev); static int hub_port_disable(struct usb_hub *hub, int port1, int set_state); static bool hub_port_warm_reset_required(struct usb_hub *hub, int port1, @@ -720,14 +719,14 @@ static void kick_hub_wq(struct usb_hub *hub) */ intf = to_usb_interface(hub->intfdev); usb_autopm_get_interface_no_resume(intf); - kref_get(&hub->kref); + hub_get(hub); if (queue_work(hub_wq, &hub->events)) return; /* the work has already been scheduled */ usb_autopm_put_interface_async(intf); - kref_put(&hub->kref, hub_release); + hub_put(hub); } void usb_kick_hub_wq(struct usb_device *hdev) @@ -1095,7 +1094,7 @@ static void hub_activate(struct usb_hub *hub, enum hub_activation_type type) goto init2; goto init3; } - kref_get(&hub->kref); + hub_get(hub); /* The superspeed hub except for root hub has to use Hub Depth * value as an offset into the route string to locate the bits @@ -1343,7 +1342,7 @@ static void hub_activate(struct usb_hub *hub, enum hub_activation_type type) device_unlock(&hdev->dev); } - kref_put(&hub->kref, hub_release); + hub_put(hub); } /* Implement the continuations for the delays above */ @@ -1759,6 +1758,16 @@ static void hub_release(struct kref *kref) kfree(hub); } +void hub_get(struct usb_hub *hub) +{ + kref_get(&hub->kref); +} + +void hub_put(struct usb_hub *hub) +{ + kref_put(&hub->kref, hub_release); +} + static unsigned highspeed_hubs; static void hub_disconnect(struct usb_interface *intf) @@ -1807,7 +1816,7 @@ static void hub_disconnect(struct usb_interface *intf) onboard_hub_destroy_pdevs(&hub->onboard_hub_devs); - kref_put(&hub->kref, hub_release); + hub_put(hub); } static bool hub_descriptor_is_sane(struct usb_host_interface *desc) @@ -5934,7 +5943,7 @@ static void hub_event(struct work_struct *work) /* Balance the stuff in kick_hub_wq() and allow autosuspend */ usb_autopm_put_interface(intf); - kref_put(&hub->kref, hub_release); + hub_put(hub); kcov_remote_stop(); } diff --git a/drivers/usb/core/hub.h b/drivers/usb/core/hub.h index 43ce21c96a51..183b69dc2955 100644 --- a/drivers/usb/core/hub.h +++ b/drivers/usb/core/hub.h @@ -129,6 +129,8 @@ extern void usb_hub_remove_port_device(struct usb_hub *hub, extern int usb_hub_set_port_power(struct usb_device *hdev, struct usb_hub *hub, int port1, bool set); extern struct usb_hub *usb_hub_to_struct_hub(struct usb_device *hdev); +extern void hub_get(struct usb_hub *hub); +extern void hub_put(struct usb_hub *hub); extern int hub_port_debounce(struct usb_hub *hub, int port1, bool must_be_connected); extern int usb_clear_port_feature(struct usb_device *hdev,

1 year, 8 months

1
0
0 0

FAILED: patch "[PATCH] USB: core: Fix deadlock in usb_deauthorize_interface()" failed to apply to 5.4-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 5.4-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y git checkout FETCH_HEAD git cherry-pick -x 80ba43e9f799cbdd83842fc27db667289b3150f5 # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024040155-shakiness-romp-690e@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^.. Possible dependencies: 80ba43e9f799 ("USB: core: Fix deadlock in usb_deauthorize_interface()") thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 80ba43e9f799cbdd83842fc27db667289b3150f5 Mon Sep 17 00:00:00 2001 From: Alan Stern <stern(a)rowland.harvard.edu> Date: Tue, 12 Mar 2024 11:48:23 -0400 Subject: [PATCH] USB: core: Fix deadlock in usb_deauthorize_interface() Among the attribute file callback routines in drivers/usb/core/sysfs.c, the interface_authorized_store() function is the only one which acquires a device lock on an ancestor device: It calls usb_deauthorize_interface(), which locks the interface's parent USB device. The will lead to deadlock if another process already owns that lock and tries to remove the interface, whether through a configuration change or because the device has been disconnected. As part of the removal procedure, device_del() waits for all ongoing sysfs attribute callbacks to complete. But usb_deauthorize_interface() can't complete until the device lock has been released, and the lock won't be released until the removal has finished. The mechanism provided by sysfs to prevent this kind of deadlock is to use the sysfs_break_active_protection() function, which tells sysfs not to wait for the attribute callback. Reported-and-tested by: Yue Sun <samsun1006219(a)gmail.com> Reported by: xingwei lee <xrivendell7(a)gmail.com> Signed-off-by: Alan Stern <stern(a)rowland.harvard.edu> Link: https://lore.kernel.org/linux-usb/CAEkJfYO6jRVC8Tfrd_R=cjO0hguhrV31fDPrLrNO… Fixes: 310d2b4124c0 ("usb: interface authorization: SysFS part of USB interface authorization") Cc: stable(a)vger.kernel.org Link: https://lore.kernel.org/r/1c37eea1-9f56-4534-b9d8-b443438dc869@rowland.harv… Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> diff --git a/drivers/usb/core/sysfs.c b/drivers/usb/core/sysfs.c index f98263e21c2a..d83231d6736a 100644 --- a/drivers/usb/core/sysfs.c +++ b/drivers/usb/core/sysfs.c @@ -1217,14 +1217,24 @@ static ssize_t interface_authorized_store(struct device *dev, { struct usb_interface *intf = to_usb_interface(dev); bool val; + struct kernfs_node *kn; if (kstrtobool(buf, &val) != 0) return -EINVAL; - if (val) + if (val) { usb_authorize_interface(intf); - else - usb_deauthorize_interface(intf); + } else { + /* + * Prevent deadlock if another process is concurrently + * trying to unregister intf. + */ + kn = sysfs_break_active_protection(&dev->kobj, &attr->attr); + if (kn) { + usb_deauthorize_interface(intf); + sysfs_unbreak_active_protection(kn); + } + } return count; }

1 year, 8 months

1
0
0 0

FAILED: patch "[PATCH] USB: core: Fix deadlock in usb_deauthorize_interface()" failed to apply to 5.10-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 5.10-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y git checkout FETCH_HEAD git cherry-pick -x 80ba43e9f799cbdd83842fc27db667289b3150f5 # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024040154-partner-wizard-6ead@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^.. Possible dependencies: 80ba43e9f799 ("USB: core: Fix deadlock in usb_deauthorize_interface()") thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 80ba43e9f799cbdd83842fc27db667289b3150f5 Mon Sep 17 00:00:00 2001 From: Alan Stern <stern(a)rowland.harvard.edu> Date: Tue, 12 Mar 2024 11:48:23 -0400 Subject: [PATCH] USB: core: Fix deadlock in usb_deauthorize_interface() Among the attribute file callback routines in drivers/usb/core/sysfs.c, the interface_authorized_store() function is the only one which acquires a device lock on an ancestor device: It calls usb_deauthorize_interface(), which locks the interface's parent USB device. The will lead to deadlock if another process already owns that lock and tries to remove the interface, whether through a configuration change or because the device has been disconnected. As part of the removal procedure, device_del() waits for all ongoing sysfs attribute callbacks to complete. But usb_deauthorize_interface() can't complete until the device lock has been released, and the lock won't be released until the removal has finished. The mechanism provided by sysfs to prevent this kind of deadlock is to use the sysfs_break_active_protection() function, which tells sysfs not to wait for the attribute callback. Reported-and-tested by: Yue Sun <samsun1006219(a)gmail.com> Reported by: xingwei lee <xrivendell7(a)gmail.com> Signed-off-by: Alan Stern <stern(a)rowland.harvard.edu> Link: https://lore.kernel.org/linux-usb/CAEkJfYO6jRVC8Tfrd_R=cjO0hguhrV31fDPrLrNO… Fixes: 310d2b4124c0 ("usb: interface authorization: SysFS part of USB interface authorization") Cc: stable(a)vger.kernel.org Link: https://lore.kernel.org/r/1c37eea1-9f56-4534-b9d8-b443438dc869@rowland.harv… Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> diff --git a/drivers/usb/core/sysfs.c b/drivers/usb/core/sysfs.c index f98263e21c2a..d83231d6736a 100644 --- a/drivers/usb/core/sysfs.c +++ b/drivers/usb/core/sysfs.c @@ -1217,14 +1217,24 @@ static ssize_t interface_authorized_store(struct device *dev, { struct usb_interface *intf = to_usb_interface(dev); bool val; + struct kernfs_node *kn; if (kstrtobool(buf, &val) != 0) return -EINVAL; - if (val) + if (val) { usb_authorize_interface(intf); - else - usb_deauthorize_interface(intf); + } else { + /* + * Prevent deadlock if another process is concurrently + * trying to unregister intf. + */ + kn = sysfs_break_active_protection(&dev->kobj, &attr->attr); + if (kn) { + usb_deauthorize_interface(intf); + sysfs_unbreak_active_protection(kn); + } + } return count; }

1 year, 8 months

1
0
0 0

FAILED: patch "[PATCH] USB: core: Fix deadlock in usb_deauthorize_interface()" failed to apply to 5.15-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 5.15-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y git checkout FETCH_HEAD git cherry-pick -x 80ba43e9f799cbdd83842fc27db667289b3150f5 # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024040153-verbalize-drum-2ff0@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^.. Possible dependencies: 80ba43e9f799 ("USB: core: Fix deadlock in usb_deauthorize_interface()") thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 80ba43e9f799cbdd83842fc27db667289b3150f5 Mon Sep 17 00:00:00 2001 From: Alan Stern <stern(a)rowland.harvard.edu> Date: Tue, 12 Mar 2024 11:48:23 -0400 Subject: [PATCH] USB: core: Fix deadlock in usb_deauthorize_interface() Among the attribute file callback routines in drivers/usb/core/sysfs.c, the interface_authorized_store() function is the only one which acquires a device lock on an ancestor device: It calls usb_deauthorize_interface(), which locks the interface's parent USB device. The will lead to deadlock if another process already owns that lock and tries to remove the interface, whether through a configuration change or because the device has been disconnected. As part of the removal procedure, device_del() waits for all ongoing sysfs attribute callbacks to complete. But usb_deauthorize_interface() can't complete until the device lock has been released, and the lock won't be released until the removal has finished. The mechanism provided by sysfs to prevent this kind of deadlock is to use the sysfs_break_active_protection() function, which tells sysfs not to wait for the attribute callback. Reported-and-tested by: Yue Sun <samsun1006219(a)gmail.com> Reported by: xingwei lee <xrivendell7(a)gmail.com> Signed-off-by: Alan Stern <stern(a)rowland.harvard.edu> Link: https://lore.kernel.org/linux-usb/CAEkJfYO6jRVC8Tfrd_R=cjO0hguhrV31fDPrLrNO… Fixes: 310d2b4124c0 ("usb: interface authorization: SysFS part of USB interface authorization") Cc: stable(a)vger.kernel.org Link: https://lore.kernel.org/r/1c37eea1-9f56-4534-b9d8-b443438dc869@rowland.harv… Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> diff --git a/drivers/usb/core/sysfs.c b/drivers/usb/core/sysfs.c index f98263e21c2a..d83231d6736a 100644 --- a/drivers/usb/core/sysfs.c +++ b/drivers/usb/core/sysfs.c @@ -1217,14 +1217,24 @@ static ssize_t interface_authorized_store(struct device *dev, { struct usb_interface *intf = to_usb_interface(dev); bool val; + struct kernfs_node *kn; if (kstrtobool(buf, &val) != 0) return -EINVAL; - if (val) + if (val) { usb_authorize_interface(intf); - else - usb_deauthorize_interface(intf); + } else { + /* + * Prevent deadlock if another process is concurrently + * trying to unregister intf. + */ + kn = sysfs_break_active_protection(&dev->kobj, &attr->attr); + if (kn) { + usb_deauthorize_interface(intf); + sysfs_unbreak_active_protection(kn); + } + } return count; }

1 year, 8 months

1
0
0 0

FAILED: patch "[PATCH] USB: core: Fix deadlock in usb_deauthorize_interface()" failed to apply to 6.1-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 6.1-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y git checkout FETCH_HEAD git cherry-pick -x 80ba43e9f799cbdd83842fc27db667289b3150f5 # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024040151-singular-unfunded-b5a8@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^.. Possible dependencies: 80ba43e9f799 ("USB: core: Fix deadlock in usb_deauthorize_interface()") thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 80ba43e9f799cbdd83842fc27db667289b3150f5 Mon Sep 17 00:00:00 2001 From: Alan Stern <stern(a)rowland.harvard.edu> Date: Tue, 12 Mar 2024 11:48:23 -0400 Subject: [PATCH] USB: core: Fix deadlock in usb_deauthorize_interface() Among the attribute file callback routines in drivers/usb/core/sysfs.c, the interface_authorized_store() function is the only one which acquires a device lock on an ancestor device: It calls usb_deauthorize_interface(), which locks the interface's parent USB device. The will lead to deadlock if another process already owns that lock and tries to remove the interface, whether through a configuration change or because the device has been disconnected. As part of the removal procedure, device_del() waits for all ongoing sysfs attribute callbacks to complete. But usb_deauthorize_interface() can't complete until the device lock has been released, and the lock won't be released until the removal has finished. The mechanism provided by sysfs to prevent this kind of deadlock is to use the sysfs_break_active_protection() function, which tells sysfs not to wait for the attribute callback. Reported-and-tested by: Yue Sun <samsun1006219(a)gmail.com> Reported by: xingwei lee <xrivendell7(a)gmail.com> Signed-off-by: Alan Stern <stern(a)rowland.harvard.edu> Link: https://lore.kernel.org/linux-usb/CAEkJfYO6jRVC8Tfrd_R=cjO0hguhrV31fDPrLrNO… Fixes: 310d2b4124c0 ("usb: interface authorization: SysFS part of USB interface authorization") Cc: stable(a)vger.kernel.org Link: https://lore.kernel.org/r/1c37eea1-9f56-4534-b9d8-b443438dc869@rowland.harv… Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> diff --git a/drivers/usb/core/sysfs.c b/drivers/usb/core/sysfs.c index f98263e21c2a..d83231d6736a 100644 --- a/drivers/usb/core/sysfs.c +++ b/drivers/usb/core/sysfs.c @@ -1217,14 +1217,24 @@ static ssize_t interface_authorized_store(struct device *dev, { struct usb_interface *intf = to_usb_interface(dev); bool val; + struct kernfs_node *kn; if (kstrtobool(buf, &val) != 0) return -EINVAL; - if (val) + if (val) { usb_authorize_interface(intf); - else - usb_deauthorize_interface(intf); + } else { + /* + * Prevent deadlock if another process is concurrently + * trying to unregister intf. + */ + kn = sysfs_break_active_protection(&dev->kobj, &attr->attr); + if (kn) { + usb_deauthorize_interface(intf); + sysfs_unbreak_active_protection(kn); + } + } return count; }

1 year, 8 months

1
0
0 0

FAILED: patch "[PATCH] usb: dwc3: Properly set system wakeup" failed to apply to 4.19-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.19-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y git checkout FETCH_HEAD git cherry-pick -x f9aa41130ac69d13a53ce2a153ca79c70d43f39c # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024040103-preheated-anthology-d288@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^.. Possible dependencies: f9aa41130ac6 ("usb: dwc3: Properly set system wakeup") 047161686b81 ("usb: dwc3: Add remote wakeup handling") 63c4c320ccf7 ("usb: dwc3: gadget: Check for L1/L2/U3 for Start Transfer") 40edb52298df ("usb: dwc3: avoid NULL access of usb_gadget_driver") c560e76319a9 ("usb: dwc3: gadget: Fix START_TRANSFER link state check") 475e8be53d04 ("usb: dwc3: gadget: Check for disabled LPM quirk") 6f0764b5adea ("usb: dwc3: add a power supply for current control") 82c46b8ed9dc ("usb: dwc3: gadget: Introduce a DWC3 VBUS draw callback") f580170f135a ("usb: dwc3: Add splitdisable quirk for Hisilicon Kirin Soc") e81a7018d93a ("usb: dwc3: allocate gadget structure dynamically") c5a7092f4015 ("usb: dwc3: gadget: make starting isoc transfers more robust") 9af21dd6faeb ("usb: dwc3: Add support for DWC_usb32 IP") 8bb14308a869 ("usb: dwc3: core: Use role-switch default dr_mode") d0550cd20e52 ("usb: dwc3: gadget: Do link recovery for SS and SSP") d94ea5319813 ("usb: dwc3: gadget: Properly set maxpacket limit") 586f4335700f ("usb: dwc3: Fix GTXFIFOSIZ.TXFDEP macro name") 5eb5afb07853 ("usb: dwc3: use proper initializers for property entries") 9ba3aca8fe82 ("usb: dwc3: Disable phy suspend after power-on reset") a0a465569b45 ("usb: dwc3: remove generic PHY calibrate() calls") c09b73cfac2a ("usb: dwc3: don't set gadget->is_otg flag") thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From f9aa41130ac69d13a53ce2a153ca79c70d43f39c Mon Sep 17 00:00:00 2001 From: Thinh Nguyen <Thinh.Nguyen(a)synopsys.com> Date: Fri, 8 Mar 2024 02:40:25 +0000 Subject: [PATCH] usb: dwc3: Properly set system wakeup If the device is configured for system wakeup, then make sure that the xHCI driver knows about it and make sure to permit wakeup only at the appropriate time. For host mode, if the controller goes through the dwc3 code path, then a child xHCI platform device is created. Make sure the platform device also inherits the wakeup setting for xHCI to enable remote wakeup. For device mode, make sure to disable system wakeup if no gadget driver is bound. We may experience unwanted system wakeup due to the wakeup signal from the controller PMU detecting connection/disconnection when in low power (D3). E.g. In the case of Steam Deck, the PCI PME prevents the system staying in suspend. Cc: stable(a)vger.kernel.org Reported-by: Guilherme G. Piccoli <gpiccoli(a)igalia.com> Closes: https://lore.kernel.org/linux-usb/70a7692d-647c-9be7-00a6-06fc60f77294@igal… Fixes: d07e8819a03d ("usb: dwc3: add xHCI Host support") Signed-off-by: Thinh Nguyen <Thinh.Nguyen(a)synopsys.com> Tested-by: Sanath S <Sanath.S(a)amd.com> Tested-by: Guilherme G. Piccoli <gpiccoli(a)igalia.com> # Steam Deck Link: https://lore.kernel.org/r/667cfda7009b502e08462c8fb3f65841d103cc0a.17098654… Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c index 3e55838c0001..31684cdaaae3 100644 --- a/drivers/usb/dwc3/core.c +++ b/drivers/usb/dwc3/core.c @@ -1519,6 +1519,8 @@ static void dwc3_get_properties(struct dwc3 *dwc) else dwc->sysdev = dwc->dev; + dwc->sys_wakeup = device_may_wakeup(dwc->sysdev); + ret = device_property_read_string(dev, "usb-psy-name", &usb_psy_name); if (ret >= 0) { dwc->usb_psy = power_supply_get_by_name(usb_psy_name); diff --git a/drivers/usb/dwc3/core.h b/drivers/usb/dwc3/core.h index c07edfc954f7..7e80dd3d466b 100644 --- a/drivers/usb/dwc3/core.h +++ b/drivers/usb/dwc3/core.h @@ -1133,6 +1133,7 @@ struct dwc3_scratchpad_array { * 3 - Reserved * @dis_metastability_quirk: set to disable metastability quirk. * @dis_split_quirk: set to disable split boundary. + * @sys_wakeup: set if the device may do system wakeup. * @wakeup_configured: set if the device is configured for remote wakeup. * @suspended: set to track suspend event due to U3/L2. * @imod_interval: set the interrupt moderation interval in 250ns @@ -1357,6 +1358,7 @@ struct dwc3 { unsigned dis_split_quirk:1; unsigned async_callbacks:1; + unsigned sys_wakeup:1; unsigned wakeup_configured:1; unsigned suspended:1; diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c index 40c52dbc28d3..4df2661f6675 100644 --- a/drivers/usb/dwc3/gadget.c +++ b/drivers/usb/dwc3/gadget.c @@ -2955,6 +2955,9 @@ static int dwc3_gadget_start(struct usb_gadget *g, dwc->gadget_driver = driver; spin_unlock_irqrestore(&dwc->lock, flags); + if (dwc->sys_wakeup) + device_wakeup_enable(dwc->sysdev); + return 0; } @@ -2970,6 +2973,9 @@ static int dwc3_gadget_stop(struct usb_gadget *g) struct dwc3 *dwc = gadget_to_dwc(g); unsigned long flags; + if (dwc->sys_wakeup) + device_wakeup_disable(dwc->sysdev); + spin_lock_irqsave(&dwc->lock, flags); dwc->gadget_driver = NULL; dwc->max_cfg_eps = 0; @@ -4651,6 +4657,10 @@ int dwc3_gadget_init(struct dwc3 *dwc) else dwc3_gadget_set_speed(dwc->gadget, dwc->maximum_speed); + /* No system wakeup if no gadget driver bound */ + if (dwc->sys_wakeup) + device_wakeup_disable(dwc->sysdev); + return 0; err5: diff --git a/drivers/usb/dwc3/host.c b/drivers/usb/dwc3/host.c index 5a5cb6ce9946..0204787df81d 100644 --- a/drivers/usb/dwc3/host.c +++ b/drivers/usb/dwc3/host.c @@ -173,6 +173,14 @@ int dwc3_host_init(struct dwc3 *dwc) goto err; } + if (dwc->sys_wakeup) { + /* Restore wakeup setting if switched from device */ + device_wakeup_enable(dwc->sysdev); + + /* Pass on wakeup setting to the new xhci platform device */ + device_init_wakeup(&xhci->dev, true); + } + return 0; err: platform_device_put(xhci); @@ -181,6 +189,9 @@ int dwc3_host_init(struct dwc3 *dwc) void dwc3_host_exit(struct dwc3 *dwc) { + if (dwc->sys_wakeup) + device_init_wakeup(&dwc->xhci->dev, false); + platform_device_unregister(dwc->xhci); dwc->xhci = NULL; }

1 year, 8 months

1
0
0 0

FAILED: patch "[PATCH] usb: dwc3: Properly set system wakeup" failed to apply to 5.4-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 5.4-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y git checkout FETCH_HEAD git cherry-pick -x f9aa41130ac69d13a53ce2a153ca79c70d43f39c # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024040102-flatterer-enslave-672e@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^.. Possible dependencies: f9aa41130ac6 ("usb: dwc3: Properly set system wakeup") 047161686b81 ("usb: dwc3: Add remote wakeup handling") 63c4c320ccf7 ("usb: dwc3: gadget: Check for L1/L2/U3 for Start Transfer") 40edb52298df ("usb: dwc3: avoid NULL access of usb_gadget_driver") c560e76319a9 ("usb: dwc3: gadget: Fix START_TRANSFER link state check") 475e8be53d04 ("usb: dwc3: gadget: Check for disabled LPM quirk") 6f0764b5adea ("usb: dwc3: add a power supply for current control") 82c46b8ed9dc ("usb: dwc3: gadget: Introduce a DWC3 VBUS draw callback") f580170f135a ("usb: dwc3: Add splitdisable quirk for Hisilicon Kirin Soc") e81a7018d93a ("usb: dwc3: allocate gadget structure dynamically") c5a7092f4015 ("usb: dwc3: gadget: make starting isoc transfers more robust") 9af21dd6faeb ("usb: dwc3: Add support for DWC_usb32 IP") 8bb14308a869 ("usb: dwc3: core: Use role-switch default dr_mode") d0550cd20e52 ("usb: dwc3: gadget: Do link recovery for SS and SSP") d94ea5319813 ("usb: dwc3: gadget: Properly set maxpacket limit") 586f4335700f ("usb: dwc3: Fix GTXFIFOSIZ.TXFDEP macro name") 5eb5afb07853 ("usb: dwc3: use proper initializers for property entries") 9ba3aca8fe82 ("usb: dwc3: Disable phy suspend after power-on reset") thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From f9aa41130ac69d13a53ce2a153ca79c70d43f39c Mon Sep 17 00:00:00 2001 From: Thinh Nguyen <Thinh.Nguyen(a)synopsys.com> Date: Fri, 8 Mar 2024 02:40:25 +0000 Subject: [PATCH] usb: dwc3: Properly set system wakeup If the device is configured for system wakeup, then make sure that the xHCI driver knows about it and make sure to permit wakeup only at the appropriate time. For host mode, if the controller goes through the dwc3 code path, then a child xHCI platform device is created. Make sure the platform device also inherits the wakeup setting for xHCI to enable remote wakeup. For device mode, make sure to disable system wakeup if no gadget driver is bound. We may experience unwanted system wakeup due to the wakeup signal from the controller PMU detecting connection/disconnection when in low power (D3). E.g. In the case of Steam Deck, the PCI PME prevents the system staying in suspend. Cc: stable(a)vger.kernel.org Reported-by: Guilherme G. Piccoli <gpiccoli(a)igalia.com> Closes: https://lore.kernel.org/linux-usb/70a7692d-647c-9be7-00a6-06fc60f77294@igal… Fixes: d07e8819a03d ("usb: dwc3: add xHCI Host support") Signed-off-by: Thinh Nguyen <Thinh.Nguyen(a)synopsys.com> Tested-by: Sanath S <Sanath.S(a)amd.com> Tested-by: Guilherme G. Piccoli <gpiccoli(a)igalia.com> # Steam Deck Link: https://lore.kernel.org/r/667cfda7009b502e08462c8fb3f65841d103cc0a.17098654… Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c index 3e55838c0001..31684cdaaae3 100644 --- a/drivers/usb/dwc3/core.c +++ b/drivers/usb/dwc3/core.c @@ -1519,6 +1519,8 @@ static void dwc3_get_properties(struct dwc3 *dwc) else dwc->sysdev = dwc->dev; + dwc->sys_wakeup = device_may_wakeup(dwc->sysdev); + ret = device_property_read_string(dev, "usb-psy-name", &usb_psy_name); if (ret >= 0) { dwc->usb_psy = power_supply_get_by_name(usb_psy_name); diff --git a/drivers/usb/dwc3/core.h b/drivers/usb/dwc3/core.h index c07edfc954f7..7e80dd3d466b 100644 --- a/drivers/usb/dwc3/core.h +++ b/drivers/usb/dwc3/core.h @@ -1133,6 +1133,7 @@ struct dwc3_scratchpad_array { * 3 - Reserved * @dis_metastability_quirk: set to disable metastability quirk. * @dis_split_quirk: set to disable split boundary. + * @sys_wakeup: set if the device may do system wakeup. * @wakeup_configured: set if the device is configured for remote wakeup. * @suspended: set to track suspend event due to U3/L2. * @imod_interval: set the interrupt moderation interval in 250ns @@ -1357,6 +1358,7 @@ struct dwc3 { unsigned dis_split_quirk:1; unsigned async_callbacks:1; + unsigned sys_wakeup:1; unsigned wakeup_configured:1; unsigned suspended:1; diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c index 40c52dbc28d3..4df2661f6675 100644 --- a/drivers/usb/dwc3/gadget.c +++ b/drivers/usb/dwc3/gadget.c @@ -2955,6 +2955,9 @@ static int dwc3_gadget_start(struct usb_gadget *g, dwc->gadget_driver = driver; spin_unlock_irqrestore(&dwc->lock, flags); + if (dwc->sys_wakeup) + device_wakeup_enable(dwc->sysdev); + return 0; } @@ -2970,6 +2973,9 @@ static int dwc3_gadget_stop(struct usb_gadget *g) struct dwc3 *dwc = gadget_to_dwc(g); unsigned long flags; + if (dwc->sys_wakeup) + device_wakeup_disable(dwc->sysdev); + spin_lock_irqsave(&dwc->lock, flags); dwc->gadget_driver = NULL; dwc->max_cfg_eps = 0; @@ -4651,6 +4657,10 @@ int dwc3_gadget_init(struct dwc3 *dwc) else dwc3_gadget_set_speed(dwc->gadget, dwc->maximum_speed); + /* No system wakeup if no gadget driver bound */ + if (dwc->sys_wakeup) + device_wakeup_disable(dwc->sysdev); + return 0; err5: diff --git a/drivers/usb/dwc3/host.c b/drivers/usb/dwc3/host.c index 5a5cb6ce9946..0204787df81d 100644 --- a/drivers/usb/dwc3/host.c +++ b/drivers/usb/dwc3/host.c @@ -173,6 +173,14 @@ int dwc3_host_init(struct dwc3 *dwc) goto err; } + if (dwc->sys_wakeup) { + /* Restore wakeup setting if switched from device */ + device_wakeup_enable(dwc->sysdev); + + /* Pass on wakeup setting to the new xhci platform device */ + device_init_wakeup(&xhci->dev, true); + } + return 0; err: platform_device_put(xhci); @@ -181,6 +189,9 @@ int dwc3_host_init(struct dwc3 *dwc) void dwc3_host_exit(struct dwc3 *dwc) { + if (dwc->sys_wakeup) + device_init_wakeup(&dwc->xhci->dev, false); + platform_device_unregister(dwc->xhci); dwc->xhci = NULL; }

1 year, 8 months

1
0
0 0

FAILED: patch "[PATCH] usb: dwc3: Properly set system wakeup" failed to apply to 5.10-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 5.10-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y git checkout FETCH_HEAD git cherry-pick -x f9aa41130ac69d13a53ce2a153ca79c70d43f39c # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024040101-uprising-avid-b607@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^.. Possible dependencies: f9aa41130ac6 ("usb: dwc3: Properly set system wakeup") 047161686b81 ("usb: dwc3: Add remote wakeup handling") 63c4c320ccf7 ("usb: dwc3: gadget: Check for L1/L2/U3 for Start Transfer") 40edb52298df ("usb: dwc3: avoid NULL access of usb_gadget_driver") c560e76319a9 ("usb: dwc3: gadget: Fix START_TRANSFER link state check") 475e8be53d04 ("usb: dwc3: gadget: Check for disabled LPM quirk") 6f0764b5adea ("usb: dwc3: add a power supply for current control") 82c46b8ed9dc ("usb: dwc3: gadget: Introduce a DWC3 VBUS draw callback") thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From f9aa41130ac69d13a53ce2a153ca79c70d43f39c Mon Sep 17 00:00:00 2001 From: Thinh Nguyen <Thinh.Nguyen(a)synopsys.com> Date: Fri, 8 Mar 2024 02:40:25 +0000 Subject: [PATCH] usb: dwc3: Properly set system wakeup If the device is configured for system wakeup, then make sure that the xHCI driver knows about it and make sure to permit wakeup only at the appropriate time. For host mode, if the controller goes through the dwc3 code path, then a child xHCI platform device is created. Make sure the platform device also inherits the wakeup setting for xHCI to enable remote wakeup. For device mode, make sure to disable system wakeup if no gadget driver is bound. We may experience unwanted system wakeup due to the wakeup signal from the controller PMU detecting connection/disconnection when in low power (D3). E.g. In the case of Steam Deck, the PCI PME prevents the system staying in suspend. Cc: stable(a)vger.kernel.org Reported-by: Guilherme G. Piccoli <gpiccoli(a)igalia.com> Closes: https://lore.kernel.org/linux-usb/70a7692d-647c-9be7-00a6-06fc60f77294@igal… Fixes: d07e8819a03d ("usb: dwc3: add xHCI Host support") Signed-off-by: Thinh Nguyen <Thinh.Nguyen(a)synopsys.com> Tested-by: Sanath S <Sanath.S(a)amd.com> Tested-by: Guilherme G. Piccoli <gpiccoli(a)igalia.com> # Steam Deck Link: https://lore.kernel.org/r/667cfda7009b502e08462c8fb3f65841d103cc0a.17098654… Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c index 3e55838c0001..31684cdaaae3 100644 --- a/drivers/usb/dwc3/core.c +++ b/drivers/usb/dwc3/core.c @@ -1519,6 +1519,8 @@ static void dwc3_get_properties(struct dwc3 *dwc) else dwc->sysdev = dwc->dev; + dwc->sys_wakeup = device_may_wakeup(dwc->sysdev); + ret = device_property_read_string(dev, "usb-psy-name", &usb_psy_name); if (ret >= 0) { dwc->usb_psy = power_supply_get_by_name(usb_psy_name); diff --git a/drivers/usb/dwc3/core.h b/drivers/usb/dwc3/core.h index c07edfc954f7..7e80dd3d466b 100644 --- a/drivers/usb/dwc3/core.h +++ b/drivers/usb/dwc3/core.h @@ -1133,6 +1133,7 @@ struct dwc3_scratchpad_array { * 3 - Reserved * @dis_metastability_quirk: set to disable metastability quirk. * @dis_split_quirk: set to disable split boundary. + * @sys_wakeup: set if the device may do system wakeup. * @wakeup_configured: set if the device is configured for remote wakeup. * @suspended: set to track suspend event due to U3/L2. * @imod_interval: set the interrupt moderation interval in 250ns @@ -1357,6 +1358,7 @@ struct dwc3 { unsigned dis_split_quirk:1; unsigned async_callbacks:1; + unsigned sys_wakeup:1; unsigned wakeup_configured:1; unsigned suspended:1; diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c index 40c52dbc28d3..4df2661f6675 100644 --- a/drivers/usb/dwc3/gadget.c +++ b/drivers/usb/dwc3/gadget.c @@ -2955,6 +2955,9 @@ static int dwc3_gadget_start(struct usb_gadget *g, dwc->gadget_driver = driver; spin_unlock_irqrestore(&dwc->lock, flags); + if (dwc->sys_wakeup) + device_wakeup_enable(dwc->sysdev); + return 0; } @@ -2970,6 +2973,9 @@ static int dwc3_gadget_stop(struct usb_gadget *g) struct dwc3 *dwc = gadget_to_dwc(g); unsigned long flags; + if (dwc->sys_wakeup) + device_wakeup_disable(dwc->sysdev); + spin_lock_irqsave(&dwc->lock, flags); dwc->gadget_driver = NULL; dwc->max_cfg_eps = 0; @@ -4651,6 +4657,10 @@ int dwc3_gadget_init(struct dwc3 *dwc) else dwc3_gadget_set_speed(dwc->gadget, dwc->maximum_speed); + /* No system wakeup if no gadget driver bound */ + if (dwc->sys_wakeup) + device_wakeup_disable(dwc->sysdev); + return 0; err5: diff --git a/drivers/usb/dwc3/host.c b/drivers/usb/dwc3/host.c index 5a5cb6ce9946..0204787df81d 100644 --- a/drivers/usb/dwc3/host.c +++ b/drivers/usb/dwc3/host.c @@ -173,6 +173,14 @@ int dwc3_host_init(struct dwc3 *dwc) goto err; } + if (dwc->sys_wakeup) { + /* Restore wakeup setting if switched from device */ + device_wakeup_enable(dwc->sysdev); + + /* Pass on wakeup setting to the new xhci platform device */ + device_init_wakeup(&xhci->dev, true); + } + return 0; err: platform_device_put(xhci); @@ -181,6 +189,9 @@ int dwc3_host_init(struct dwc3 *dwc) void dwc3_host_exit(struct dwc3 *dwc) { + if (dwc->sys_wakeup) + device_init_wakeup(&dwc->xhci->dev, false); + platform_device_unregister(dwc->xhci); dwc->xhci = NULL; }

1 year, 8 months

1
0
0 0

FAILED: patch "[PATCH] usb: dwc3: Properly set system wakeup" failed to apply to 5.15-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 5.15-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y git checkout FETCH_HEAD git cherry-pick -x f9aa41130ac69d13a53ce2a153ca79c70d43f39c # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024040159-entrench-bogged-287d@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^.. Possible dependencies: f9aa41130ac6 ("usb: dwc3: Properly set system wakeup") 047161686b81 ("usb: dwc3: Add remote wakeup handling") 63c4c320ccf7 ("usb: dwc3: gadget: Check for L1/L2/U3 for Start Transfer") thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From f9aa41130ac69d13a53ce2a153ca79c70d43f39c Mon Sep 17 00:00:00 2001 From: Thinh Nguyen <Thinh.Nguyen(a)synopsys.com> Date: Fri, 8 Mar 2024 02:40:25 +0000 Subject: [PATCH] usb: dwc3: Properly set system wakeup If the device is configured for system wakeup, then make sure that the xHCI driver knows about it and make sure to permit wakeup only at the appropriate time. For host mode, if the controller goes through the dwc3 code path, then a child xHCI platform device is created. Make sure the platform device also inherits the wakeup setting for xHCI to enable remote wakeup. For device mode, make sure to disable system wakeup if no gadget driver is bound. We may experience unwanted system wakeup due to the wakeup signal from the controller PMU detecting connection/disconnection when in low power (D3). E.g. In the case of Steam Deck, the PCI PME prevents the system staying in suspend. Cc: stable(a)vger.kernel.org Reported-by: Guilherme G. Piccoli <gpiccoli(a)igalia.com> Closes: https://lore.kernel.org/linux-usb/70a7692d-647c-9be7-00a6-06fc60f77294@igal… Fixes: d07e8819a03d ("usb: dwc3: add xHCI Host support") Signed-off-by: Thinh Nguyen <Thinh.Nguyen(a)synopsys.com> Tested-by: Sanath S <Sanath.S(a)amd.com> Tested-by: Guilherme G. Piccoli <gpiccoli(a)igalia.com> # Steam Deck Link: https://lore.kernel.org/r/667cfda7009b502e08462c8fb3f65841d103cc0a.17098654… Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c index 3e55838c0001..31684cdaaae3 100644 --- a/drivers/usb/dwc3/core.c +++ b/drivers/usb/dwc3/core.c @@ -1519,6 +1519,8 @@ static void dwc3_get_properties(struct dwc3 *dwc) else dwc->sysdev = dwc->dev; + dwc->sys_wakeup = device_may_wakeup(dwc->sysdev); + ret = device_property_read_string(dev, "usb-psy-name", &usb_psy_name); if (ret >= 0) { dwc->usb_psy = power_supply_get_by_name(usb_psy_name); diff --git a/drivers/usb/dwc3/core.h b/drivers/usb/dwc3/core.h index c07edfc954f7..7e80dd3d466b 100644 --- a/drivers/usb/dwc3/core.h +++ b/drivers/usb/dwc3/core.h @@ -1133,6 +1133,7 @@ struct dwc3_scratchpad_array { * 3 - Reserved * @dis_metastability_quirk: set to disable metastability quirk. * @dis_split_quirk: set to disable split boundary. + * @sys_wakeup: set if the device may do system wakeup. * @wakeup_configured: set if the device is configured for remote wakeup. * @suspended: set to track suspend event due to U3/L2. * @imod_interval: set the interrupt moderation interval in 250ns @@ -1357,6 +1358,7 @@ struct dwc3 { unsigned dis_split_quirk:1; unsigned async_callbacks:1; + unsigned sys_wakeup:1; unsigned wakeup_configured:1; unsigned suspended:1; diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c index 40c52dbc28d3..4df2661f6675 100644 --- a/drivers/usb/dwc3/gadget.c +++ b/drivers/usb/dwc3/gadget.c @@ -2955,6 +2955,9 @@ static int dwc3_gadget_start(struct usb_gadget *g, dwc->gadget_driver = driver; spin_unlock_irqrestore(&dwc->lock, flags); + if (dwc->sys_wakeup) + device_wakeup_enable(dwc->sysdev); + return 0; } @@ -2970,6 +2973,9 @@ static int dwc3_gadget_stop(struct usb_gadget *g) struct dwc3 *dwc = gadget_to_dwc(g); unsigned long flags; + if (dwc->sys_wakeup) + device_wakeup_disable(dwc->sysdev); + spin_lock_irqsave(&dwc->lock, flags); dwc->gadget_driver = NULL; dwc->max_cfg_eps = 0; @@ -4651,6 +4657,10 @@ int dwc3_gadget_init(struct dwc3 *dwc) else dwc3_gadget_set_speed(dwc->gadget, dwc->maximum_speed); + /* No system wakeup if no gadget driver bound */ + if (dwc->sys_wakeup) + device_wakeup_disable(dwc->sysdev); + return 0; err5: diff --git a/drivers/usb/dwc3/host.c b/drivers/usb/dwc3/host.c index 5a5cb6ce9946..0204787df81d 100644 --- a/drivers/usb/dwc3/host.c +++ b/drivers/usb/dwc3/host.c @@ -173,6 +173,14 @@ int dwc3_host_init(struct dwc3 *dwc) goto err; } + if (dwc->sys_wakeup) { + /* Restore wakeup setting if switched from device */ + device_wakeup_enable(dwc->sysdev); + + /* Pass on wakeup setting to the new xhci platform device */ + device_init_wakeup(&xhci->dev, true); + } + return 0; err: platform_device_put(xhci); @@ -181,6 +189,9 @@ int dwc3_host_init(struct dwc3 *dwc) void dwc3_host_exit(struct dwc3 *dwc) { + if (dwc->sys_wakeup) + device_init_wakeup(&dwc->xhci->dev, false); + platform_device_unregister(dwc->xhci); dwc->xhci = NULL; }

1 year, 8 months

1
0
0 0

FAILED: patch "[PATCH] usb: dwc3: Properly set system wakeup" failed to apply to 6.1-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 6.1-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y git checkout FETCH_HEAD git cherry-pick -x f9aa41130ac69d13a53ce2a153ca79c70d43f39c # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024040158-headrest-purge-7899@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^.. Possible dependencies: f9aa41130ac6 ("usb: dwc3: Properly set system wakeup") 047161686b81 ("usb: dwc3: Add remote wakeup handling") thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From f9aa41130ac69d13a53ce2a153ca79c70d43f39c Mon Sep 17 00:00:00 2001 From: Thinh Nguyen <Thinh.Nguyen(a)synopsys.com> Date: Fri, 8 Mar 2024 02:40:25 +0000 Subject: [PATCH] usb: dwc3: Properly set system wakeup If the device is configured for system wakeup, then make sure that the xHCI driver knows about it and make sure to permit wakeup only at the appropriate time. For host mode, if the controller goes through the dwc3 code path, then a child xHCI platform device is created. Make sure the platform device also inherits the wakeup setting for xHCI to enable remote wakeup. For device mode, make sure to disable system wakeup if no gadget driver is bound. We may experience unwanted system wakeup due to the wakeup signal from the controller PMU detecting connection/disconnection when in low power (D3). E.g. In the case of Steam Deck, the PCI PME prevents the system staying in suspend. Cc: stable(a)vger.kernel.org Reported-by: Guilherme G. Piccoli <gpiccoli(a)igalia.com> Closes: https://lore.kernel.org/linux-usb/70a7692d-647c-9be7-00a6-06fc60f77294@igal… Fixes: d07e8819a03d ("usb: dwc3: add xHCI Host support") Signed-off-by: Thinh Nguyen <Thinh.Nguyen(a)synopsys.com> Tested-by: Sanath S <Sanath.S(a)amd.com> Tested-by: Guilherme G. Piccoli <gpiccoli(a)igalia.com> # Steam Deck Link: https://lore.kernel.org/r/667cfda7009b502e08462c8fb3f65841d103cc0a.17098654… Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c index 3e55838c0001..31684cdaaae3 100644 --- a/drivers/usb/dwc3/core.c +++ b/drivers/usb/dwc3/core.c @@ -1519,6 +1519,8 @@ static void dwc3_get_properties(struct dwc3 *dwc) else dwc->sysdev = dwc->dev; + dwc->sys_wakeup = device_may_wakeup(dwc->sysdev); + ret = device_property_read_string(dev, "usb-psy-name", &usb_psy_name); if (ret >= 0) { dwc->usb_psy = power_supply_get_by_name(usb_psy_name); diff --git a/drivers/usb/dwc3/core.h b/drivers/usb/dwc3/core.h index c07edfc954f7..7e80dd3d466b 100644 --- a/drivers/usb/dwc3/core.h +++ b/drivers/usb/dwc3/core.h @@ -1133,6 +1133,7 @@ struct dwc3_scratchpad_array { * 3 - Reserved * @dis_metastability_quirk: set to disable metastability quirk. * @dis_split_quirk: set to disable split boundary. + * @sys_wakeup: set if the device may do system wakeup. * @wakeup_configured: set if the device is configured for remote wakeup. * @suspended: set to track suspend event due to U3/L2. * @imod_interval: set the interrupt moderation interval in 250ns @@ -1357,6 +1358,7 @@ struct dwc3 { unsigned dis_split_quirk:1; unsigned async_callbacks:1; + unsigned sys_wakeup:1; unsigned wakeup_configured:1; unsigned suspended:1; diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c index 40c52dbc28d3..4df2661f6675 100644 --- a/drivers/usb/dwc3/gadget.c +++ b/drivers/usb/dwc3/gadget.c @@ -2955,6 +2955,9 @@ static int dwc3_gadget_start(struct usb_gadget *g, dwc->gadget_driver = driver; spin_unlock_irqrestore(&dwc->lock, flags); + if (dwc->sys_wakeup) + device_wakeup_enable(dwc->sysdev); + return 0; } @@ -2970,6 +2973,9 @@ static int dwc3_gadget_stop(struct usb_gadget *g) struct dwc3 *dwc = gadget_to_dwc(g); unsigned long flags; + if (dwc->sys_wakeup) + device_wakeup_disable(dwc->sysdev); + spin_lock_irqsave(&dwc->lock, flags); dwc->gadget_driver = NULL; dwc->max_cfg_eps = 0; @@ -4651,6 +4657,10 @@ int dwc3_gadget_init(struct dwc3 *dwc) else dwc3_gadget_set_speed(dwc->gadget, dwc->maximum_speed); + /* No system wakeup if no gadget driver bound */ + if (dwc->sys_wakeup) + device_wakeup_disable(dwc->sysdev); + return 0; err5: diff --git a/drivers/usb/dwc3/host.c b/drivers/usb/dwc3/host.c index 5a5cb6ce9946..0204787df81d 100644 --- a/drivers/usb/dwc3/host.c +++ b/drivers/usb/dwc3/host.c @@ -173,6 +173,14 @@ int dwc3_host_init(struct dwc3 *dwc) goto err; } + if (dwc->sys_wakeup) { + /* Restore wakeup setting if switched from device */ + device_wakeup_enable(dwc->sysdev); + + /* Pass on wakeup setting to the new xhci platform device */ + device_init_wakeup(&xhci->dev, true); + } + return 0; err: platform_device_put(xhci); @@ -181,6 +189,9 @@ int dwc3_host_init(struct dwc3 *dwc) void dwc3_host_exit(struct dwc3 *dwc) { + if (dwc->sys_wakeup) + device_init_wakeup(&dwc->xhci->dev, false); + platform_device_unregister(dwc->xhci); dwc->xhci = NULL; }

1 year, 8 months

1
0
0 0

FAILED: patch "[PATCH] x86/sev: Skip ROM range scans and validation for SEV-SNP" failed to apply to 6.1-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 6.1-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y git checkout FETCH_HEAD git cherry-pick -x 0f4a1e80989aca185d955fcd791d7750082044a2 # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024040110-nimble-unfair-399c@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^.. Possible dependencies: 0f4a1e80989a ("x86/sev: Skip ROM range scans and validation for SEV-SNP guests") 428080c9b19b ("x86/sev: Move early startup code into .head.text section") b82a8dbd3d2f ("x86/coco: Disable 32-bit emulation by default on TDX and SEV") ed766c26119c ("Merge tag 'x86-entry-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip") thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 0f4a1e80989aca185d955fcd791d7750082044a2 Mon Sep 17 00:00:00 2001 From: Kevin Loughlin <kevinloughlin(a)google.com> Date: Wed, 13 Mar 2024 12:15:46 +0000 Subject: [PATCH] x86/sev: Skip ROM range scans and validation for SEV-SNP guests SEV-SNP requires encrypted memory to be validated before access. Because the ROM memory range is not part of the e820 table, it is not pre-validated by the BIOS. Therefore, if a SEV-SNP guest kernel wishes to access this range, the guest must first validate the range. The current SEV-SNP code does indeed scan the ROM range during early boot and thus attempts to validate the ROM range in probe_roms(). However, this behavior is neither sufficient nor necessary for the following reasons: * With regards to sufficiency, if EFI_CONFIG_TABLES are not enabled and CONFIG_DMI_SCAN_MACHINE_NON_EFI_FALLBACK is set, the kernel will attempt to access the memory at SMBIOS_ENTRY_POINT_SCAN_START (which falls in the ROM range) prior to validation. For example, Project Oak Stage 0 provides a minimal guest firmware that currently meets these configuration conditions, meaning guests booting atop Oak Stage 0 firmware encounter a problematic call chain during dmi_setup() -> dmi_scan_machine() that results in a crash during boot if SEV-SNP is enabled. * With regards to necessity, SEV-SNP guests generally read garbage (which changes across boots) from the ROM range, meaning these scans are unnecessary. The guest reads garbage because the legacy ROM range is unencrypted data but is accessed via an encrypted PMD during early boot (where the PMD is marked as encrypted due to potentially mapping actually-encrypted data in other PMD-contained ranges). In one exceptional case, EISA probing treats the ROM range as unencrypted data, which is inconsistent with other probing. Continuing to allow SEV-SNP guests to use garbage and to inconsistently classify ROM range encryption status can trigger undesirable behavior. For instance, if garbage bytes appear to be a valid signature, memory may be unnecessarily reserved for the ROM range. Future code or other use cases may result in more problematic (arbitrary) behavior that should be avoided. While one solution would be to overhaul the early PMD mapping to always treat the ROM region of the PMD as unencrypted, SEV-SNP guests do not currently rely on data from the ROM region during early boot (and even if they did, they would be mostly relying on garbage data anyways). As a simpler solution, skip the ROM range scans (and the otherwise- necessary range validation) during SEV-SNP guest early boot. The potential SEV-SNP guest crash due to lack of ROM range validation is thus avoided by simply not accessing the ROM range. In most cases, skip the scans by overriding problematic x86_init functions during sme_early_init() to SNP-safe variants, which can be likened to x86_init overrides done for other platforms (ex: Xen); such overrides also avoid the spread of cc_platform_has() checks throughout the tree. In the exceptional EISA case, still use cc_platform_has() for the simplest change, given (1) checks for guest type (ex: Xen domain status) are already performed here, and (2) these checks occur in a subsys initcall instead of an x86_init function. [ bp: Massage commit message, remove "we"s. ] Fixes: 9704c07bf9f7 ("x86/kernel: Validate ROM memory before accessing when SEV-SNP is active") Signed-off-by: Kevin Loughlin <kevinloughlin(a)google.com> Signed-off-by: Borislav Petkov (AMD) <bp(a)alien8.de> Cc: <stable(a)kernel.org> Link: https://lore.kernel.org/r/20240313121546.2964854-1-kevinloughlin@google.com diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h index 9477b4053bce..07e125f32528 100644 --- a/arch/x86/include/asm/sev.h +++ b/arch/x86/include/asm/sev.h @@ -218,12 +218,12 @@ void early_snp_set_memory_private(unsigned long vaddr, unsigned long paddr, unsigned long npages); void early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr, unsigned long npages); -void __init snp_prep_memory(unsigned long paddr, unsigned int sz, enum psc_op op); void snp_set_memory_shared(unsigned long vaddr, unsigned long npages); void snp_set_memory_private(unsigned long vaddr, unsigned long npages); void snp_set_wakeup_secondary_cpu(void); bool snp_init(struct boot_params *bp); void __noreturn snp_abort(void); +void snp_dmi_setup(void); int snp_issue_guest_request(u64 exit_code, struct snp_req_data *input, struct snp_guest_request_ioctl *rio); void snp_accept_memory(phys_addr_t start, phys_addr_t end); u64 snp_get_unsupported_features(u64 status); @@ -244,12 +244,12 @@ static inline void __init early_snp_set_memory_private(unsigned long vaddr, unsigned long paddr, unsigned long npages) { } static inline void __init early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr, unsigned long npages) { } -static inline void __init snp_prep_memory(unsigned long paddr, unsigned int sz, enum psc_op op) { } static inline void snp_set_memory_shared(unsigned long vaddr, unsigned long npages) { } static inline void snp_set_memory_private(unsigned long vaddr, unsigned long npages) { } static inline void snp_set_wakeup_secondary_cpu(void) { } static inline bool snp_init(struct boot_params *bp) { return false; } static inline void snp_abort(void) { } +static inline void snp_dmi_setup(void) { } static inline int snp_issue_guest_request(u64 exit_code, struct snp_req_data *input, struct snp_guest_request_ioctl *rio) { return -ENOTTY; diff --git a/arch/x86/include/asm/x86_init.h b/arch/x86/include/asm/x86_init.h index b89b40f250e6..6149eabe200f 100644 --- a/arch/x86/include/asm/x86_init.h +++ b/arch/x86/include/asm/x86_init.h @@ -30,12 +30,13 @@ struct x86_init_mpparse { * @reserve_resources: reserve the standard resources for the * platform * @memory_setup: platform specific memory setup - * + * @dmi_setup: platform specific DMI setup */ struct x86_init_resources { void (*probe_roms)(void); void (*reserve_resources)(void); char *(*memory_setup)(void); + void (*dmi_setup)(void); }; /** diff --git a/arch/x86/kernel/eisa.c b/arch/x86/kernel/eisa.c index e963344b0449..53935b4d62e3 100644 --- a/arch/x86/kernel/eisa.c +++ b/arch/x86/kernel/eisa.c @@ -2,6 +2,7 @@ /* * EISA specific code */ +#include <linux/cc_platform.h> #include <linux/ioport.h> #include <linux/eisa.h> #include <linux/io.h> @@ -12,7 +13,7 @@ static __init int eisa_bus_probe(void) { void __iomem *p; - if (xen_pv_domain() && !xen_initial_domain()) + if ((xen_pv_domain() && !xen_initial_domain()) || cc_platform_has(CC_ATTR_GUEST_SEV_SNP)) return 0; p = ioremap(0x0FFFD9, 4); diff --git a/arch/x86/kernel/probe_roms.c b/arch/x86/kernel/probe_roms.c index 319fef37d9dc..cc2c34ba7228 100644 --- a/arch/x86/kernel/probe_roms.c +++ b/arch/x86/kernel/probe_roms.c @@ -203,16 +203,6 @@ void __init probe_roms(void) unsigned char c; int i; - /* - * The ROM memory range is not part of the e820 table and is therefore not - * pre-validated by BIOS. The kernel page table maps the ROM region as encrypted - * memory, and SNP requires encrypted memory to be validated before access. - * Do that here. - */ - snp_prep_memory(video_rom_resource.start, - ((system_rom_resource.end + 1) - video_rom_resource.start), - SNP_PAGE_STATE_PRIVATE); - /* video rom */ upper = adapter_rom_resources[0].start; for (start = video_rom_resource.start; start < upper; start += 2048) { diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index ef206500ed6f..0109e6c510e0 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -9,7 +9,6 @@ #include <linux/console.h> #include <linux/crash_dump.h> #include <linux/dma-map-ops.h> -#include <linux/dmi.h> #include <linux/efi.h> #include <linux/ima.h> #include <linux/init_ohci1394_dma.h> @@ -902,7 +901,7 @@ void __init setup_arch(char **cmdline_p) efi_init(); reserve_ibft_region(); - dmi_setup(); + x86_init.resources.dmi_setup(); /* * VMware detection requires dmi to be available, so this diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c index b59b09c2f284..7e1e63cc48e6 100644 --- a/arch/x86/kernel/sev.c +++ b/arch/x86/kernel/sev.c @@ -23,6 +23,7 @@ #include <linux/platform_device.h> #include <linux/io.h> #include <linux/psp-sev.h> +#include <linux/dmi.h> #include <uapi/linux/sev-guest.h> #include <asm/init.h> @@ -795,21 +796,6 @@ void __init early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr early_set_pages_state(vaddr, paddr, npages, SNP_PAGE_STATE_SHARED); } -void __init snp_prep_memory(unsigned long paddr, unsigned int sz, enum psc_op op) -{ - unsigned long vaddr, npages; - - vaddr = (unsigned long)__va(paddr); - npages = PAGE_ALIGN(sz) >> PAGE_SHIFT; - - if (op == SNP_PAGE_STATE_PRIVATE) - early_snp_set_memory_private(vaddr, paddr, npages); - else if (op == SNP_PAGE_STATE_SHARED) - early_snp_set_memory_shared(vaddr, paddr, npages); - else - WARN(1, "invalid memory op %d\n", op); -} - static unsigned long __set_pages_state(struct snp_psc_desc *data, unsigned long vaddr, unsigned long vaddr_end, int op) { @@ -2136,6 +2122,17 @@ void __head __noreturn snp_abort(void) sev_es_terminate(SEV_TERM_SET_GEN, GHCB_SNP_UNSUPPORTED); } +/* + * SEV-SNP guests should only execute dmi_setup() if EFI_CONFIG_TABLES are + * enabled, as the alternative (fallback) logic for DMI probing in the legacy + * ROM region can cause a crash since this region is not pre-validated. + */ +void __init snp_dmi_setup(void) +{ + if (efi_enabled(EFI_CONFIG_TABLES)) + dmi_setup(); +} + static void dump_cpuid_table(void) { const struct snp_cpuid_table *cpuid_table = snp_cpuid_get_table(); diff --git a/arch/x86/kernel/x86_init.c b/arch/x86/kernel/x86_init.c index a42830dc151b..d5dc5a92635a 100644 --- a/arch/x86/kernel/x86_init.c +++ b/arch/x86/kernel/x86_init.c @@ -3,6 +3,7 @@ * * For licencing details see kernel-base/COPYING */ +#include <linux/dmi.h> #include <linux/init.h> #include <linux/ioport.h> #include <linux/export.h> @@ -66,6 +67,7 @@ struct x86_init_ops x86_init __initdata = { .probe_roms = probe_roms, .reserve_resources = reserve_standard_io_resources, .memory_setup = e820__memory_setup_default, + .dmi_setup = dmi_setup, }, .mpparse = { diff --git a/arch/x86/mm/mem_encrypt_amd.c b/arch/x86/mm/mem_encrypt_amd.c index 70b91de2e053..422602f6039b 100644 --- a/arch/x86/mm/mem_encrypt_amd.c +++ b/arch/x86/mm/mem_encrypt_amd.c @@ -492,6 +492,24 @@ void __init sme_early_init(void) */ if (sev_status & MSR_AMD64_SEV_ENABLED) ia32_disable(); + + /* + * Override init functions that scan the ROM region in SEV-SNP guests, + * as this memory is not pre-validated and would thus cause a crash. + */ + if (sev_status & MSR_AMD64_SEV_SNP_ENABLED) { + x86_init.mpparse.find_mptable = x86_init_noop; + x86_init.pci.init_irq = x86_init_noop; + x86_init.resources.probe_roms = x86_init_noop; + + /* + * DMI setup behavior for SEV-SNP guests depends on + * efi_enabled(EFI_CONFIG_TABLES), which hasn't been + * parsed yet. snp_dmi_setup() will run after that + * parsing has happened. + */ + x86_init.resources.dmi_setup = snp_dmi_setup; + } } void __init mem_encrypt_free_decrypted_mem(void)

1 year, 8 months

1
0
0 0

FAILED: patch "[PATCH] x86/sev: Skip ROM range scans and validation for SEV-SNP" failed to apply to 6.6-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 6.6-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y git checkout FETCH_HEAD git cherry-pick -x 0f4a1e80989aca185d955fcd791d7750082044a2 # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024040108-dreadful-pumice-e64d@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^.. Possible dependencies: 0f4a1e80989a ("x86/sev: Skip ROM range scans and validation for SEV-SNP guests") 428080c9b19b ("x86/sev: Move early startup code into .head.text section") b82a8dbd3d2f ("x86/coco: Disable 32-bit emulation by default on TDX and SEV") ed766c26119c ("Merge tag 'x86-entry-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip") thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 0f4a1e80989aca185d955fcd791d7750082044a2 Mon Sep 17 00:00:00 2001 From: Kevin Loughlin <kevinloughlin(a)google.com> Date: Wed, 13 Mar 2024 12:15:46 +0000 Subject: [PATCH] x86/sev: Skip ROM range scans and validation for SEV-SNP guests SEV-SNP requires encrypted memory to be validated before access. Because the ROM memory range is not part of the e820 table, it is not pre-validated by the BIOS. Therefore, if a SEV-SNP guest kernel wishes to access this range, the guest must first validate the range. The current SEV-SNP code does indeed scan the ROM range during early boot and thus attempts to validate the ROM range in probe_roms(). However, this behavior is neither sufficient nor necessary for the following reasons: * With regards to sufficiency, if EFI_CONFIG_TABLES are not enabled and CONFIG_DMI_SCAN_MACHINE_NON_EFI_FALLBACK is set, the kernel will attempt to access the memory at SMBIOS_ENTRY_POINT_SCAN_START (which falls in the ROM range) prior to validation. For example, Project Oak Stage 0 provides a minimal guest firmware that currently meets these configuration conditions, meaning guests booting atop Oak Stage 0 firmware encounter a problematic call chain during dmi_setup() -> dmi_scan_machine() that results in a crash during boot if SEV-SNP is enabled. * With regards to necessity, SEV-SNP guests generally read garbage (which changes across boots) from the ROM range, meaning these scans are unnecessary. The guest reads garbage because the legacy ROM range is unencrypted data but is accessed via an encrypted PMD during early boot (where the PMD is marked as encrypted due to potentially mapping actually-encrypted data in other PMD-contained ranges). In one exceptional case, EISA probing treats the ROM range as unencrypted data, which is inconsistent with other probing. Continuing to allow SEV-SNP guests to use garbage and to inconsistently classify ROM range encryption status can trigger undesirable behavior. For instance, if garbage bytes appear to be a valid signature, memory may be unnecessarily reserved for the ROM range. Future code or other use cases may result in more problematic (arbitrary) behavior that should be avoided. While one solution would be to overhaul the early PMD mapping to always treat the ROM region of the PMD as unencrypted, SEV-SNP guests do not currently rely on data from the ROM region during early boot (and even if they did, they would be mostly relying on garbage data anyways). As a simpler solution, skip the ROM range scans (and the otherwise- necessary range validation) during SEV-SNP guest early boot. The potential SEV-SNP guest crash due to lack of ROM range validation is thus avoided by simply not accessing the ROM range. In most cases, skip the scans by overriding problematic x86_init functions during sme_early_init() to SNP-safe variants, which can be likened to x86_init overrides done for other platforms (ex: Xen); such overrides also avoid the spread of cc_platform_has() checks throughout the tree. In the exceptional EISA case, still use cc_platform_has() for the simplest change, given (1) checks for guest type (ex: Xen domain status) are already performed here, and (2) these checks occur in a subsys initcall instead of an x86_init function. [ bp: Massage commit message, remove "we"s. ] Fixes: 9704c07bf9f7 ("x86/kernel: Validate ROM memory before accessing when SEV-SNP is active") Signed-off-by: Kevin Loughlin <kevinloughlin(a)google.com> Signed-off-by: Borislav Petkov (AMD) <bp(a)alien8.de> Cc: <stable(a)kernel.org> Link: https://lore.kernel.org/r/20240313121546.2964854-1-kevinloughlin@google.com diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h index 9477b4053bce..07e125f32528 100644 --- a/arch/x86/include/asm/sev.h +++ b/arch/x86/include/asm/sev.h @@ -218,12 +218,12 @@ void early_snp_set_memory_private(unsigned long vaddr, unsigned long paddr, unsigned long npages); void early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr, unsigned long npages); -void __init snp_prep_memory(unsigned long paddr, unsigned int sz, enum psc_op op); void snp_set_memory_shared(unsigned long vaddr, unsigned long npages); void snp_set_memory_private(unsigned long vaddr, unsigned long npages); void snp_set_wakeup_secondary_cpu(void); bool snp_init(struct boot_params *bp); void __noreturn snp_abort(void); +void snp_dmi_setup(void); int snp_issue_guest_request(u64 exit_code, struct snp_req_data *input, struct snp_guest_request_ioctl *rio); void snp_accept_memory(phys_addr_t start, phys_addr_t end); u64 snp_get_unsupported_features(u64 status); @@ -244,12 +244,12 @@ static inline void __init early_snp_set_memory_private(unsigned long vaddr, unsigned long paddr, unsigned long npages) { } static inline void __init early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr, unsigned long npages) { } -static inline void __init snp_prep_memory(unsigned long paddr, unsigned int sz, enum psc_op op) { } static inline void snp_set_memory_shared(unsigned long vaddr, unsigned long npages) { } static inline void snp_set_memory_private(unsigned long vaddr, unsigned long npages) { } static inline void snp_set_wakeup_secondary_cpu(void) { } static inline bool snp_init(struct boot_params *bp) { return false; } static inline void snp_abort(void) { } +static inline void snp_dmi_setup(void) { } static inline int snp_issue_guest_request(u64 exit_code, struct snp_req_data *input, struct snp_guest_request_ioctl *rio) { return -ENOTTY; diff --git a/arch/x86/include/asm/x86_init.h b/arch/x86/include/asm/x86_init.h index b89b40f250e6..6149eabe200f 100644 --- a/arch/x86/include/asm/x86_init.h +++ b/arch/x86/include/asm/x86_init.h @@ -30,12 +30,13 @@ struct x86_init_mpparse { * @reserve_resources: reserve the standard resources for the * platform * @memory_setup: platform specific memory setup - * + * @dmi_setup: platform specific DMI setup */ struct x86_init_resources { void (*probe_roms)(void); void (*reserve_resources)(void); char *(*memory_setup)(void); + void (*dmi_setup)(void); }; /** diff --git a/arch/x86/kernel/eisa.c b/arch/x86/kernel/eisa.c index e963344b0449..53935b4d62e3 100644 --- a/arch/x86/kernel/eisa.c +++ b/arch/x86/kernel/eisa.c @@ -2,6 +2,7 @@ /* * EISA specific code */ +#include <linux/cc_platform.h> #include <linux/ioport.h> #include <linux/eisa.h> #include <linux/io.h> @@ -12,7 +13,7 @@ static __init int eisa_bus_probe(void) { void __iomem *p; - if (xen_pv_domain() && !xen_initial_domain()) + if ((xen_pv_domain() && !xen_initial_domain()) || cc_platform_has(CC_ATTR_GUEST_SEV_SNP)) return 0; p = ioremap(0x0FFFD9, 4); diff --git a/arch/x86/kernel/probe_roms.c b/arch/x86/kernel/probe_roms.c index 319fef37d9dc..cc2c34ba7228 100644 --- a/arch/x86/kernel/probe_roms.c +++ b/arch/x86/kernel/probe_roms.c @@ -203,16 +203,6 @@ void __init probe_roms(void) unsigned char c; int i; - /* - * The ROM memory range is not part of the e820 table and is therefore not - * pre-validated by BIOS. The kernel page table maps the ROM region as encrypted - * memory, and SNP requires encrypted memory to be validated before access. - * Do that here. - */ - snp_prep_memory(video_rom_resource.start, - ((system_rom_resource.end + 1) - video_rom_resource.start), - SNP_PAGE_STATE_PRIVATE); - /* video rom */ upper = adapter_rom_resources[0].start; for (start = video_rom_resource.start; start < upper; start += 2048) { diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index ef206500ed6f..0109e6c510e0 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -9,7 +9,6 @@ #include <linux/console.h> #include <linux/crash_dump.h> #include <linux/dma-map-ops.h> -#include <linux/dmi.h> #include <linux/efi.h> #include <linux/ima.h> #include <linux/init_ohci1394_dma.h> @@ -902,7 +901,7 @@ void __init setup_arch(char **cmdline_p) efi_init(); reserve_ibft_region(); - dmi_setup(); + x86_init.resources.dmi_setup(); /* * VMware detection requires dmi to be available, so this diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c index b59b09c2f284..7e1e63cc48e6 100644 --- a/arch/x86/kernel/sev.c +++ b/arch/x86/kernel/sev.c @@ -23,6 +23,7 @@ #include <linux/platform_device.h> #include <linux/io.h> #include <linux/psp-sev.h> +#include <linux/dmi.h> #include <uapi/linux/sev-guest.h> #include <asm/init.h> @@ -795,21 +796,6 @@ void __init early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr early_set_pages_state(vaddr, paddr, npages, SNP_PAGE_STATE_SHARED); } -void __init snp_prep_memory(unsigned long paddr, unsigned int sz, enum psc_op op) -{ - unsigned long vaddr, npages; - - vaddr = (unsigned long)__va(paddr); - npages = PAGE_ALIGN(sz) >> PAGE_SHIFT; - - if (op == SNP_PAGE_STATE_PRIVATE) - early_snp_set_memory_private(vaddr, paddr, npages); - else if (op == SNP_PAGE_STATE_SHARED) - early_snp_set_memory_shared(vaddr, paddr, npages); - else - WARN(1, "invalid memory op %d\n", op); -} - static unsigned long __set_pages_state(struct snp_psc_desc *data, unsigned long vaddr, unsigned long vaddr_end, int op) { @@ -2136,6 +2122,17 @@ void __head __noreturn snp_abort(void) sev_es_terminate(SEV_TERM_SET_GEN, GHCB_SNP_UNSUPPORTED); } +/* + * SEV-SNP guests should only execute dmi_setup() if EFI_CONFIG_TABLES are + * enabled, as the alternative (fallback) logic for DMI probing in the legacy + * ROM region can cause a crash since this region is not pre-validated. + */ +void __init snp_dmi_setup(void) +{ + if (efi_enabled(EFI_CONFIG_TABLES)) + dmi_setup(); +} + static void dump_cpuid_table(void) { const struct snp_cpuid_table *cpuid_table = snp_cpuid_get_table(); diff --git a/arch/x86/kernel/x86_init.c b/arch/x86/kernel/x86_init.c index a42830dc151b..d5dc5a92635a 100644 --- a/arch/x86/kernel/x86_init.c +++ b/arch/x86/kernel/x86_init.c @@ -3,6 +3,7 @@ * * For licencing details see kernel-base/COPYING */ +#include <linux/dmi.h> #include <linux/init.h> #include <linux/ioport.h> #include <linux/export.h> @@ -66,6 +67,7 @@ struct x86_init_ops x86_init __initdata = { .probe_roms = probe_roms, .reserve_resources = reserve_standard_io_resources, .memory_setup = e820__memory_setup_default, + .dmi_setup = dmi_setup, }, .mpparse = { diff --git a/arch/x86/mm/mem_encrypt_amd.c b/arch/x86/mm/mem_encrypt_amd.c index 70b91de2e053..422602f6039b 100644 --- a/arch/x86/mm/mem_encrypt_amd.c +++ b/arch/x86/mm/mem_encrypt_amd.c @@ -492,6 +492,24 @@ void __init sme_early_init(void) */ if (sev_status & MSR_AMD64_SEV_ENABLED) ia32_disable(); + + /* + * Override init functions that scan the ROM region in SEV-SNP guests, + * as this memory is not pre-validated and would thus cause a crash. + */ + if (sev_status & MSR_AMD64_SEV_SNP_ENABLED) { + x86_init.mpparse.find_mptable = x86_init_noop; + x86_init.pci.init_irq = x86_init_noop; + x86_init.resources.probe_roms = x86_init_noop; + + /* + * DMI setup behavior for SEV-SNP guests depends on + * efi_enabled(EFI_CONFIG_TABLES), which hasn't been + * parsed yet. snp_dmi_setup() will run after that + * parsing has happened. + */ + x86_init.resources.dmi_setup = snp_dmi_setup; + } } void __init mem_encrypt_free_decrypted_mem(void)

1 year, 8 months

1
0
0 0

FAILED: patch "[PATCH] x86/sev: Skip ROM range scans and validation for SEV-SNP" failed to apply to 6.7-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 6.7-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.7.y git checkout FETCH_HEAD git cherry-pick -x 0f4a1e80989aca185d955fcd791d7750082044a2 # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024040105-swarm-prenatal-5d72@gregkh' --subject-prefix 'PATCH 6.7.y' HEAD^.. Possible dependencies: 0f4a1e80989a ("x86/sev: Skip ROM range scans and validation for SEV-SNP guests") 428080c9b19b ("x86/sev: Move early startup code into .head.text section") thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 0f4a1e80989aca185d955fcd791d7750082044a2 Mon Sep 17 00:00:00 2001 From: Kevin Loughlin <kevinloughlin(a)google.com> Date: Wed, 13 Mar 2024 12:15:46 +0000 Subject: [PATCH] x86/sev: Skip ROM range scans and validation for SEV-SNP guests SEV-SNP requires encrypted memory to be validated before access. Because the ROM memory range is not part of the e820 table, it is not pre-validated by the BIOS. Therefore, if a SEV-SNP guest kernel wishes to access this range, the guest must first validate the range. The current SEV-SNP code does indeed scan the ROM range during early boot and thus attempts to validate the ROM range in probe_roms(). However, this behavior is neither sufficient nor necessary for the following reasons: * With regards to sufficiency, if EFI_CONFIG_TABLES are not enabled and CONFIG_DMI_SCAN_MACHINE_NON_EFI_FALLBACK is set, the kernel will attempt to access the memory at SMBIOS_ENTRY_POINT_SCAN_START (which falls in the ROM range) prior to validation. For example, Project Oak Stage 0 provides a minimal guest firmware that currently meets these configuration conditions, meaning guests booting atop Oak Stage 0 firmware encounter a problematic call chain during dmi_setup() -> dmi_scan_machine() that results in a crash during boot if SEV-SNP is enabled. * With regards to necessity, SEV-SNP guests generally read garbage (which changes across boots) from the ROM range, meaning these scans are unnecessary. The guest reads garbage because the legacy ROM range is unencrypted data but is accessed via an encrypted PMD during early boot (where the PMD is marked as encrypted due to potentially mapping actually-encrypted data in other PMD-contained ranges). In one exceptional case, EISA probing treats the ROM range as unencrypted data, which is inconsistent with other probing. Continuing to allow SEV-SNP guests to use garbage and to inconsistently classify ROM range encryption status can trigger undesirable behavior. For instance, if garbage bytes appear to be a valid signature, memory may be unnecessarily reserved for the ROM range. Future code or other use cases may result in more problematic (arbitrary) behavior that should be avoided. While one solution would be to overhaul the early PMD mapping to always treat the ROM region of the PMD as unencrypted, SEV-SNP guests do not currently rely on data from the ROM region during early boot (and even if they did, they would be mostly relying on garbage data anyways). As a simpler solution, skip the ROM range scans (and the otherwise- necessary range validation) during SEV-SNP guest early boot. The potential SEV-SNP guest crash due to lack of ROM range validation is thus avoided by simply not accessing the ROM range. In most cases, skip the scans by overriding problematic x86_init functions during sme_early_init() to SNP-safe variants, which can be likened to x86_init overrides done for other platforms (ex: Xen); such overrides also avoid the spread of cc_platform_has() checks throughout the tree. In the exceptional EISA case, still use cc_platform_has() for the simplest change, given (1) checks for guest type (ex: Xen domain status) are already performed here, and (2) these checks occur in a subsys initcall instead of an x86_init function. [ bp: Massage commit message, remove "we"s. ] Fixes: 9704c07bf9f7 ("x86/kernel: Validate ROM memory before accessing when SEV-SNP is active") Signed-off-by: Kevin Loughlin <kevinloughlin(a)google.com> Signed-off-by: Borislav Petkov (AMD) <bp(a)alien8.de> Cc: <stable(a)kernel.org> Link: https://lore.kernel.org/r/20240313121546.2964854-1-kevinloughlin@google.com diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h index 9477b4053bce..07e125f32528 100644 --- a/arch/x86/include/asm/sev.h +++ b/arch/x86/include/asm/sev.h @@ -218,12 +218,12 @@ void early_snp_set_memory_private(unsigned long vaddr, unsigned long paddr, unsigned long npages); void early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr, unsigned long npages); -void __init snp_prep_memory(unsigned long paddr, unsigned int sz, enum psc_op op); void snp_set_memory_shared(unsigned long vaddr, unsigned long npages); void snp_set_memory_private(unsigned long vaddr, unsigned long npages); void snp_set_wakeup_secondary_cpu(void); bool snp_init(struct boot_params *bp); void __noreturn snp_abort(void); +void snp_dmi_setup(void); int snp_issue_guest_request(u64 exit_code, struct snp_req_data *input, struct snp_guest_request_ioctl *rio); void snp_accept_memory(phys_addr_t start, phys_addr_t end); u64 snp_get_unsupported_features(u64 status); @@ -244,12 +244,12 @@ static inline void __init early_snp_set_memory_private(unsigned long vaddr, unsigned long paddr, unsigned long npages) { } static inline void __init early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr, unsigned long npages) { } -static inline void __init snp_prep_memory(unsigned long paddr, unsigned int sz, enum psc_op op) { } static inline void snp_set_memory_shared(unsigned long vaddr, unsigned long npages) { } static inline void snp_set_memory_private(unsigned long vaddr, unsigned long npages) { } static inline void snp_set_wakeup_secondary_cpu(void) { } static inline bool snp_init(struct boot_params *bp) { return false; } static inline void snp_abort(void) { } +static inline void snp_dmi_setup(void) { } static inline int snp_issue_guest_request(u64 exit_code, struct snp_req_data *input, struct snp_guest_request_ioctl *rio) { return -ENOTTY; diff --git a/arch/x86/include/asm/x86_init.h b/arch/x86/include/asm/x86_init.h index b89b40f250e6..6149eabe200f 100644 --- a/arch/x86/include/asm/x86_init.h +++ b/arch/x86/include/asm/x86_init.h @@ -30,12 +30,13 @@ struct x86_init_mpparse { * @reserve_resources: reserve the standard resources for the * platform * @memory_setup: platform specific memory setup - * + * @dmi_setup: platform specific DMI setup */ struct x86_init_resources { void (*probe_roms)(void); void (*reserve_resources)(void); char *(*memory_setup)(void); + void (*dmi_setup)(void); }; /** diff --git a/arch/x86/kernel/eisa.c b/arch/x86/kernel/eisa.c index e963344b0449..53935b4d62e3 100644 --- a/arch/x86/kernel/eisa.c +++ b/arch/x86/kernel/eisa.c @@ -2,6 +2,7 @@ /* * EISA specific code */ +#include <linux/cc_platform.h> #include <linux/ioport.h> #include <linux/eisa.h> #include <linux/io.h> @@ -12,7 +13,7 @@ static __init int eisa_bus_probe(void) { void __iomem *p; - if (xen_pv_domain() && !xen_initial_domain()) + if ((xen_pv_domain() && !xen_initial_domain()) || cc_platform_has(CC_ATTR_GUEST_SEV_SNP)) return 0; p = ioremap(0x0FFFD9, 4); diff --git a/arch/x86/kernel/probe_roms.c b/arch/x86/kernel/probe_roms.c index 319fef37d9dc..cc2c34ba7228 100644 --- a/arch/x86/kernel/probe_roms.c +++ b/arch/x86/kernel/probe_roms.c @@ -203,16 +203,6 @@ void __init probe_roms(void) unsigned char c; int i; - /* - * The ROM memory range is not part of the e820 table and is therefore not - * pre-validated by BIOS. The kernel page table maps the ROM region as encrypted - * memory, and SNP requires encrypted memory to be validated before access. - * Do that here. - */ - snp_prep_memory(video_rom_resource.start, - ((system_rom_resource.end + 1) - video_rom_resource.start), - SNP_PAGE_STATE_PRIVATE); - /* video rom */ upper = adapter_rom_resources[0].start; for (start = video_rom_resource.start; start < upper; start += 2048) { diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index ef206500ed6f..0109e6c510e0 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -9,7 +9,6 @@ #include <linux/console.h> #include <linux/crash_dump.h> #include <linux/dma-map-ops.h> -#include <linux/dmi.h> #include <linux/efi.h> #include <linux/ima.h> #include <linux/init_ohci1394_dma.h> @@ -902,7 +901,7 @@ void __init setup_arch(char **cmdline_p) efi_init(); reserve_ibft_region(); - dmi_setup(); + x86_init.resources.dmi_setup(); /* * VMware detection requires dmi to be available, so this diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c index b59b09c2f284..7e1e63cc48e6 100644 --- a/arch/x86/kernel/sev.c +++ b/arch/x86/kernel/sev.c @@ -23,6 +23,7 @@ #include <linux/platform_device.h> #include <linux/io.h> #include <linux/psp-sev.h> +#include <linux/dmi.h> #include <uapi/linux/sev-guest.h> #include <asm/init.h> @@ -795,21 +796,6 @@ void __init early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr early_set_pages_state(vaddr, paddr, npages, SNP_PAGE_STATE_SHARED); } -void __init snp_prep_memory(unsigned long paddr, unsigned int sz, enum psc_op op) -{ - unsigned long vaddr, npages; - - vaddr = (unsigned long)__va(paddr); - npages = PAGE_ALIGN(sz) >> PAGE_SHIFT; - - if (op == SNP_PAGE_STATE_PRIVATE) - early_snp_set_memory_private(vaddr, paddr, npages); - else if (op == SNP_PAGE_STATE_SHARED) - early_snp_set_memory_shared(vaddr, paddr, npages); - else - WARN(1, "invalid memory op %d\n", op); -} - static unsigned long __set_pages_state(struct snp_psc_desc *data, unsigned long vaddr, unsigned long vaddr_end, int op) { @@ -2136,6 +2122,17 @@ void __head __noreturn snp_abort(void) sev_es_terminate(SEV_TERM_SET_GEN, GHCB_SNP_UNSUPPORTED); } +/* + * SEV-SNP guests should only execute dmi_setup() if EFI_CONFIG_TABLES are + * enabled, as the alternative (fallback) logic for DMI probing in the legacy + * ROM region can cause a crash since this region is not pre-validated. + */ +void __init snp_dmi_setup(void) +{ + if (efi_enabled(EFI_CONFIG_TABLES)) + dmi_setup(); +} + static void dump_cpuid_table(void) { const struct snp_cpuid_table *cpuid_table = snp_cpuid_get_table(); diff --git a/arch/x86/kernel/x86_init.c b/arch/x86/kernel/x86_init.c index a42830dc151b..d5dc5a92635a 100644 --- a/arch/x86/kernel/x86_init.c +++ b/arch/x86/kernel/x86_init.c @@ -3,6 +3,7 @@ * * For licencing details see kernel-base/COPYING */ +#include <linux/dmi.h> #include <linux/init.h> #include <linux/ioport.h> #include <linux/export.h> @@ -66,6 +67,7 @@ struct x86_init_ops x86_init __initdata = { .probe_roms = probe_roms, .reserve_resources = reserve_standard_io_resources, .memory_setup = e820__memory_setup_default, + .dmi_setup = dmi_setup, }, .mpparse = { diff --git a/arch/x86/mm/mem_encrypt_amd.c b/arch/x86/mm/mem_encrypt_amd.c index 70b91de2e053..422602f6039b 100644 --- a/arch/x86/mm/mem_encrypt_amd.c +++ b/arch/x86/mm/mem_encrypt_amd.c @@ -492,6 +492,24 @@ void __init sme_early_init(void) */ if (sev_status & MSR_AMD64_SEV_ENABLED) ia32_disable(); + + /* + * Override init functions that scan the ROM region in SEV-SNP guests, + * as this memory is not pre-validated and would thus cause a crash. + */ + if (sev_status & MSR_AMD64_SEV_SNP_ENABLED) { + x86_init.mpparse.find_mptable = x86_init_noop; + x86_init.pci.init_irq = x86_init_noop; + x86_init.resources.probe_roms = x86_init_noop; + + /* + * DMI setup behavior for SEV-SNP guests depends on + * efi_enabled(EFI_CONFIG_TABLES), which hasn't been + * parsed yet. snp_dmi_setup() will run after that + * parsing has happened. + */ + x86_init.resources.dmi_setup = snp_dmi_setup; + } } void __init mem_encrypt_free_decrypted_mem(void)

1 year, 8 months

1
0
0 0

[PATCH v4.19-v5.4] vt: fix memory overlapping when deleting chars in the buffer

by Kuntal Nayak

From: Yangxi Xiang <xyangxi5(a)gmail.com> [ upstream commit 39cdb68c64d8 ] A memory overlapping copy occurs when deleting a long line. This memory overlapping copy can cause data corruption when scr_memcpyw is optimized to memcpy because memcpy does not ensure its behavior if the destination buffer overlaps with the source buffer. The line buffer is not always broken, because the memcpy utilizes the hardware acceleration, whose result is not deterministic. Fix this problem by using replacing the scr_memcpyw with scr_memmovew. Fixes: 81732c3b2fed ("tty vt: Fix line garbage in virtual console on command line edition") Cc: stable <stable(a)kernel.org> Signed-off-by: Yangxi Xiang <xyangxi5(a)gmail.com> Link: https://lore.kernel.org/r/20220628093322.5688-1-xyangxi5@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> [ KN: vc_state is not a separate structure in LTS v4.19, v5.4. Adjusted the patch accordingly by using vc_x instead of state.x for backport. ] Signed-off-by: Kuntal Nayak <kuntal.nayak(a)broadcom.com> --- drivers/tty/vt/vt.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/tty/vt/vt.c b/drivers/tty/vt/vt.c index c9083d853..a351e264d 100644 --- a/drivers/tty/vt/vt.c +++ b/drivers/tty/vt/vt.c @@ -855,7 +855,7 @@ static void delete_char(struct vc_data *vc, unsigned int nr) unsigned short *p = (unsigned short *) vc->vc_pos; vc_uniscr_delete(vc, nr); - scr_memcpyw(p, p + nr, (vc->vc_cols - vc->vc_x - nr) * 2); + scr_memmovew(p, p + nr, (vc->vc_cols - vc->vc_x - nr) * 2); scr_memsetw(p + vc->vc_cols - vc->vc_x - nr, vc->vc_video_erase_char, nr * 2); vc->vc_need_wrap = 0; -- 2.39.0

1 year, 8 months

3
3
0 0

[RFC PATCH v2 1/6] cxl/core: correct length of DPA field masks

by Shiyang Ruan

The length of Physical Address in General Media Event Record/DRAM Event Record is 64-bit, so the field mask should be defined as such length. Otherwise, this causes cxl_general_media and cxl_dram tracepoints to mask off the upper-32-bits of DPA addresses. The cxl_poison event is unaffected. If userspace was doing its own DPA-to-HPA translation this could lead to incorrect page retirement decisions, but there is no known consumer (like rasdaemon) of this event today. Fixes: d54a531a430b ("cxl/mem: Trace General Media Event Record") Cc: <stable(a)vger.kernel.org> Cc: Dan Williams <dan.j.williams(a)intel.com> Cc: Davidlohr Bueso <dave(a)stgolabs.net> Cc: Jonathan Cameron <Jonathan.Cameron(a)huawei.com> Cc: Ira Weiny <ira.weiny(a)intel.com> Signed-off-by: Shiyang Ruan <ruansy.fnst(a)fujitsu.com> --- drivers/cxl/core/trace.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/cxl/core/trace.h b/drivers/cxl/core/trace.h index e5f13260fc52..e2d1f296df97 100644 --- a/drivers/cxl/core/trace.h +++ b/drivers/cxl/core/trace.h @@ -253,11 +253,11 @@ TRACE_EVENT(cxl_generic_event, * DRAM Event Record * CXL rev 3.0 section 8.2.9.2.1.2; Table 8-44 */ -#define CXL_DPA_FLAGS_MASK 0x3F +#define CXL_DPA_FLAGS_MASK 0x3FULL #define CXL_DPA_MASK (~CXL_DPA_FLAGS_MASK) -#define CXL_DPA_VOLATILE BIT(0) -#define CXL_DPA_NOT_REPAIRABLE BIT(1) +#define CXL_DPA_VOLATILE BIT_ULL(0) +#define CXL_DPA_NOT_REPAIRABLE BIT_ULL(1) #define show_dpa_flags(flags) __print_flags(flags, "|", \ { CXL_DPA_VOLATILE, "VOLATILE" }, \ { CXL_DPA_NOT_REPAIRABLE, "NOT_REPAIRABLE" } \ -- 2.34.1

1 year, 8 months

2
2
0 0

[PATCH] net: usb: ax88179_178a: avoid the interface always configured as random address

by Jose Ignacio Tornos Martinez

After the commit d2689b6a86b9 ("net: usb: ax88179_178a: avoid two consecutive device resets"), reset is not executed from bind operation and mac address is not read from the device registers or the devicetree at that moment. Since the check to configure if the assigned mac address is random or not for the interface, happens after the bind operation from usbnet_probe, the interface keeps configured as random address, although the address is correctly read and set during open operation (the only reset now). In order to keep only one reset for the device and to avoid the interface always configured as random address, after reset, configure correctly the suitable field from the driver, if the mac address is read successfully from the device registers or the devicetree. In addition, if mac address can not be read from the driver, a random address is configured again, so it is not necessary to call eth_hw_addr_random from here. Indeed, in this situtatuon, when reset was also executed from bind, this was invalidating the check to configure if the assigned mac address for the interface was random or not. cc: stable(a)vger.kernel.org # 6.6+ Fixes: d2689b6a86b9 ("net: usb: ax88179_178a: avoid two consecutive device resets") Reported-by: Dave Stevenson <dave.stevenson(a)raspberrypi.com> Signed-off-by: Jose Ignacio Tornos Martinez <jtornosm(a)redhat.com> --- drivers/net/usb/ax88179_178a.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/drivers/net/usb/ax88179_178a.c b/drivers/net/usb/ax88179_178a.c index 88e084534853..d2324cc02461 100644 --- a/drivers/net/usb/ax88179_178a.c +++ b/drivers/net/usb/ax88179_178a.c @@ -1273,10 +1273,9 @@ static void ax88179_get_mac_addr(struct usbnet *dev) if (is_valid_ether_addr(mac)) { eth_hw_addr_set(dev->net, mac); - } else { + dev->net->addr_assign_type = NET_ADDR_PERM; + } else netdev_info(dev->net, "invalid MAC address, using random\n"); - eth_hw_addr_random(dev->net); - } ax88179_write_cmd(dev, AX_ACCESS_MAC, AX_NODE_ID, ETH_ALEN, ETH_ALEN, dev->net->dev_addr); -- 2.44.0

1 year, 8 months

2
6
0 0

[REGRESSION] external monitor+Dell dock in 6.8

by Andrei Gaponenko

Hello, I noticed a regression with the mailine kernel pre-compiled by EPEL. I have just tried linux-6.9-rc1.tar.gz from kernel.org, and it still misbehaves. The default setup: a laptop is connected to a dock, Dell WD22TB4, via a USB-C cable. The dock is connected to an external monitor via a Display Port cable. With a "good" kernel everything works. With a "broken" kernel, the external monitor is still correctly identified by the system, and is shown as enabled in plasma systemsettings. The system also behaves like the monitor is working, for example, one can move the mouse pointer off the laptop screen. However the external monitor screen stays black, and it eventually goes to sleep. Everything worked with EPEL mainline kernels up to and including kernel-ml-6.7.9-1.el9.elrepo.x86_64 The breakage is observed in kernel-ml-6.8.1-1.el9.elrepo.x86_64 kernel-ml-6.8.2-1.el9.elrepo.x86_64 linux-6.9-rc1.tar.gz from kernel.org (with olddefconfig) Other tests: using an HDMI cable instead of the Display Port cable between the monitor and the dock does not change things, black screen with the newer kernels. Using a small HDMI-to-USB-C adapter instead of the dock results in a working system, even with the newer kernels. So the breakage appears to be specific to the Dell WD22TB4 dock. Operating System: AlmaLinux 9.3 (Shamrock Pampas Cat) uname -mi: x86_64 x86_64 Laptop: Dell Precision 5470/02RK6V lsusb |grep dock Bus 003 Device 007: ID 413c:b06e Dell Computer Corp. Dell dock Bus 003 Device 008: ID 413c:b06f Dell Computer Corp. Dell dock Bus 003 Device 006: ID 0bda:5413 Realtek Semiconductor Corp. Dell dock Bus 003 Device 005: ID 0bda:5487 Realtek Semiconductor Corp. Dell dock Bus 002 Device 004: ID 0bda:0413 Realtek Semiconductor Corp. Dell dock Bus 002 Device 003: ID 0bda:0487 Realtek Semiconductor Corp. Dell dock dmesg and kernel config are attached to https://bugzilla.kernel.org/show_bug.cgi?id=218663 #regzbot introduced: v6.7.9..v6.8.1 Andrei

1 year, 8 months

2
4
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-stable-mirror April 2024