This is the start of the stable review cycle for the 4.4.263 release. There are 14 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 24 Mar 2021 12:19:09 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.4.263-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.4.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 4.4.263-rc1
Thomas Gleixner tglx@linutronix.de genirq: Disable interrupts for force threaded handlers
Shijie Luo luoshijie1@huawei.com ext4: fix potential error in ext4_do_update_inode
zhangyi (F) yi.zhang@huawei.com ext4: find old entry again if failed to rename whiteout
Thomas Gleixner tglx@linutronix.de x86/ioapic: Ignore IRQ2 again
Tyrel Datwyler tyreld@linux.ibm.com PCI: rpadlpar: Fix potential drc_name corruption in store functions
Jim Lin jilin@nvidia.com usb: gadget: configfs: Fix KASAN use-after-free
Macpaul Lin macpaul.lin@mediatek.com USB: replace hardcode maximum usb string length by definition
Dan Carpenter dan.carpenter@oracle.com scsi: lpfc: Fix some error codes in debugfs
Joe Korty joe.korty@concurrent-rt.com NFSD: Repair misuse of sv_lock in 5.10.16-rt30.
Filipe Manana fdmanana@suse.com btrfs: fix race when cloning extent buffer during rewind of an old root
Gwendal Grignou gwendal@chromium.org platform/chrome: cros_ec_dev - Fix security issue
Jan Kara jack@suse.cz ext4: check journal inode extents more carefully
Jan Kara jack@suse.cz ext4: don't allow overlapping system zones
Jan Kara jack@suse.cz ext4: handle error of ext4_setup_system_zone() on remount
-------------
Diffstat:
Makefile | 4 +- arch/x86/kernel/apic/io_apic.c | 10 +++++ drivers/pci/hotplug/rpadlpar_sysfs.c | 14 +++---- drivers/platform/chrome/cros_ec_dev.c | 4 ++ drivers/platform/chrome/cros_ec_proto.c | 4 +- drivers/scsi/lpfc/lpfc_debugfs.c | 4 +- drivers/usb/gadget/composite.c | 4 +- drivers/usb/gadget/configfs.c | 16 +++++--- drivers/usb/gadget/usbstring.c | 4 +- fs/btrfs/ctree.c | 2 + fs/ext4/block_validity.c | 71 +++++++++++++++------------------ fs/ext4/ext4.h | 6 +-- fs/ext4/extents.c | 16 +++----- fs/ext4/indirect.c | 6 +-- fs/ext4/inode.c | 13 +++--- fs/ext4/mballoc.c | 4 +- fs/ext4/namei.c | 29 +++++++++++++- fs/ext4/super.c | 5 ++- include/linux/mfd/cros_ec.h | 6 ++- include/uapi/linux/usb/ch9.h | 3 ++ kernel/irq/manage.c | 4 ++ net/sunrpc/svc_xprt.c | 4 +- 22 files changed, 139 insertions(+), 94 deletions(-)
From: Jan Kara jack@suse.cz
commit d176b1f62f242ab259ff665a26fbac69db1aecba upstream.
ext4_setup_system_zone() can fail. Handle the failure in ext4_remount().
Reviewed-by: Lukas Czerner lczerner@redhat.com Signed-off-by: Jan Kara jack@suse.cz Link: https://lore.kernel.org/r/20200728130437.7804-2-jack@suse.cz Signed-off-by: Theodore Ts'o tytso@mit.edu Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ext4/super.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
--- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -4968,7 +4968,10 @@ static int ext4_remount(struct super_blo ext4_register_li_request(sb, first_not_zeroed); }
- ext4_setup_system_zone(sb); + err = ext4_setup_system_zone(sb); + if (err) + goto restore_opts; + if (sbi->s_journal == NULL && !(old_sb_flags & MS_RDONLY)) ext4_commit_super(sb, 1);
From: Jan Kara jack@suse.cz
commit bf9a379d0980e7413d94cb18dac73db2bfc5f470 upstream.
Currently, add_system_zone() just silently merges two added system zones that overlap. However the overlap should not happen and it generally suggests that some unrelated metadata overlap which indicates the fs is corrupted. We should have caught such problems earlier (e.g. in ext4_check_descriptors()) but add this check as another line of defense. In later patch we also use this for stricter checking of journal inode extent tree.
Reviewed-by: Lukas Czerner lczerner@redhat.com Signed-off-by: Jan Kara jack@suse.cz Link: https://lore.kernel.org/r/20200728130437.7804-3-jack@suse.cz Signed-off-by: Theodore Ts'o tytso@mit.edu Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ext4/block_validity.c | 34 ++++++++++++---------------------- 1 file changed, 12 insertions(+), 22 deletions(-)
--- a/fs/ext4/block_validity.c +++ b/fs/ext4/block_validity.c @@ -57,7 +57,7 @@ static int add_system_zone(struct ext4_s ext4_fsblk_t start_blk, unsigned int count) { - struct ext4_system_zone *new_entry = NULL, *entry; + struct ext4_system_zone *new_entry, *entry; struct rb_node **n = &sbi->system_blks.rb_node, *node; struct rb_node *parent = NULL, *new_node = NULL;
@@ -68,30 +68,20 @@ static int add_system_zone(struct ext4_s n = &(*n)->rb_left; else if (start_blk >= (entry->start_blk + entry->count)) n = &(*n)->rb_right; - else { - if (start_blk + count > (entry->start_blk + - entry->count)) - entry->count = (start_blk + count - - entry->start_blk); - new_node = *n; - new_entry = rb_entry(new_node, struct ext4_system_zone, - node); - break; - } + else /* Unexpected overlap of system zones. */ + return -EFSCORRUPTED; }
- if (!new_entry) { - new_entry = kmem_cache_alloc(ext4_system_zone_cachep, - GFP_KERNEL); - if (!new_entry) - return -ENOMEM; - new_entry->start_blk = start_blk; - new_entry->count = count; - new_node = &new_entry->node; + new_entry = kmem_cache_alloc(ext4_system_zone_cachep, + GFP_KERNEL); + if (!new_entry) + return -ENOMEM; + new_entry->start_blk = start_blk; + new_entry->count = count; + new_node = &new_entry->node;
- rb_link_node(new_node, parent, n); - rb_insert_color(new_node, &sbi->system_blks); - } + rb_link_node(new_node, parent, n); + rb_insert_color(new_node, &sbi->system_blks);
/* Can we merge to the left? */ node = rb_prev(new_node);
From: Jan Kara jack@suse.cz
commit ce9f24cccdc019229b70a5c15e2b09ad9c0ab5d1 upstream.
Currently, system zones just track ranges of block, that are "important" fs metadata (bitmaps, group descriptors, journal blocks, etc.). This however complicates how extent tree (or indirect blocks) can be checked for inodes that actually track such metadata - currently the journal inode but arguably we should be treating quota files or resize inode similarly. We cannot run __ext4_ext_check() on such metadata inodes when loading their extents as that would immediately trigger the validity checks and so we just hack around that and special-case the journal inode. This however leads to a situation that a journal inode which has extent tree of depth at least one can have invalid extent tree that gets unnoticed until ext4_cache_extents() crashes.
To overcome this limitation, track inode number each system zone belongs to (0 is used for zones not belonging to any inode). We can then verify inode number matches the expected one when verifying extent tree and thus avoid the false errors. With this there's no need to to special-case journal inode during extent tree checking anymore so remove it.
Fixes: 0a944e8a6c66 ("ext4: don't perform block validity checks on the journal inode") Reported-by: Wolfgang Frisch wolfgang.frisch@suse.com Reviewed-by: Lukas Czerner lczerner@redhat.com Signed-off-by: Jan Kara jack@suse.cz Link: https://lore.kernel.org/r/20200728130437.7804-4-jack@suse.cz Signed-off-by: Theodore Ts'o tytso@mit.edu Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ext4/block_validity.c | 37 +++++++++++++++++++++---------------- fs/ext4/ext4.h | 6 +++--- fs/ext4/extents.c | 16 ++++++---------- fs/ext4/indirect.c | 6 ++---- fs/ext4/inode.c | 5 ++--- fs/ext4/mballoc.c | 4 ++-- 6 files changed, 36 insertions(+), 38 deletions(-)
--- a/fs/ext4/block_validity.c +++ b/fs/ext4/block_validity.c @@ -23,6 +23,7 @@ struct ext4_system_zone { struct rb_node node; ext4_fsblk_t start_blk; unsigned int count; + u32 ino; };
static struct kmem_cache *ext4_system_zone_cachep; @@ -43,7 +44,8 @@ void ext4_exit_system_zone(void) static inline int can_merge(struct ext4_system_zone *entry1, struct ext4_system_zone *entry2) { - if ((entry1->start_blk + entry1->count) == entry2->start_blk) + if ((entry1->start_blk + entry1->count) == entry2->start_blk && + entry1->ino == entry2->ino) return 1; return 0; } @@ -55,7 +57,7 @@ static inline int can_merge(struct ext4_ */ static int add_system_zone(struct ext4_sb_info *sbi, ext4_fsblk_t start_blk, - unsigned int count) + unsigned int count, u32 ino) { struct ext4_system_zone *new_entry, *entry; struct rb_node **n = &sbi->system_blks.rb_node, *node; @@ -78,6 +80,7 @@ static int add_system_zone(struct ext4_s return -ENOMEM; new_entry->start_blk = start_blk; new_entry->count = count; + new_entry->ino = ino; new_node = &new_entry->node;
rb_link_node(new_node, parent, n); @@ -153,16 +156,16 @@ static int ext4_protect_reserved_inode(s if (n == 0) { i++; } else { - if (!ext4_data_block_valid(sbi, map.m_pblk, n)) { - ext4_error(sb, "blocks %llu-%llu from inode %u " + err = add_system_zone(sbi, map.m_pblk, n, ino); + if (err < 0) { + if (err == -EFSCORRUPTED) { + ext4_error(sb, + "blocks %llu-%llu from inode %u " "overlap system zone", map.m_pblk, map.m_pblk + map.m_len - 1, ino); - err = -EFSCORRUPTED; + } break; } - err = add_system_zone(sbi, map.m_pblk, n); - if (err < 0) - break; i += n; } } @@ -191,16 +194,16 @@ int ext4_setup_system_zone(struct super_ if (ext4_bg_has_super(sb, i) && ((i < 5) || ((i % flex_size) == 0))) add_system_zone(sbi, ext4_group_first_block_no(sb, i), - ext4_bg_num_gdb(sb, i) + 1); + ext4_bg_num_gdb(sb, i) + 1, 0); gdp = ext4_get_group_desc(sb, i, NULL); - ret = add_system_zone(sbi, ext4_block_bitmap(sb, gdp), 1); + ret = add_system_zone(sbi, ext4_block_bitmap(sb, gdp), 1, 0); if (ret) return ret; - ret = add_system_zone(sbi, ext4_inode_bitmap(sb, gdp), 1); + ret = add_system_zone(sbi, ext4_inode_bitmap(sb, gdp), 1, 0); if (ret) return ret; ret = add_system_zone(sbi, ext4_inode_table(sb, gdp), - sbi->s_itb_per_group); + sbi->s_itb_per_group, 0); if (ret) return ret; } @@ -233,10 +236,11 @@ void ext4_release_system_zone(struct sup * start_blk+count) is valid; 0 if some part of the block region * overlaps with filesystem metadata blocks. */ -int ext4_data_block_valid(struct ext4_sb_info *sbi, ext4_fsblk_t start_blk, - unsigned int count) +int ext4_inode_block_valid(struct inode *inode, ext4_fsblk_t start_blk, + unsigned int count) { struct ext4_system_zone *entry; + struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb); struct rb_node *n = sbi->system_blks.rb_node;
if ((start_blk <= le32_to_cpu(sbi->s_es->s_first_data_block)) || @@ -252,6 +256,8 @@ int ext4_data_block_valid(struct ext4_sb else if (start_blk >= (entry->start_blk + entry->count)) n = n->rb_right; else { + if (entry->ino == inode->i_ino) + return 1; sbi->s_es->s_last_error_block = cpu_to_le64(start_blk); return 0; } @@ -274,8 +280,7 @@ int ext4_check_blockref(const char *func while (bref < p+max) { blk = le32_to_cpu(*bref++); if (blk && - unlikely(!ext4_data_block_valid(EXT4_SB(inode->i_sb), - blk, 1))) { + unlikely(!ext4_inode_block_valid(inode, blk, 1))) { es->s_last_error_block = cpu_to_le64(blk); ext4_error_inode(inode, function, line, blk, "invalid block"); --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -3134,9 +3134,9 @@ extern void ext4_release_system_zone(str extern int ext4_setup_system_zone(struct super_block *sb); extern int __init ext4_init_system_zone(void); extern void ext4_exit_system_zone(void); -extern int ext4_data_block_valid(struct ext4_sb_info *sbi, - ext4_fsblk_t start_blk, - unsigned int count); +extern int ext4_inode_block_valid(struct inode *inode, + ext4_fsblk_t start_blk, + unsigned int count); extern int ext4_check_blockref(const char *, unsigned int, struct inode *, __le32 *, unsigned int);
--- a/fs/ext4/extents.c +++ b/fs/ext4/extents.c @@ -384,7 +384,7 @@ static int ext4_valid_extent(struct inod */ if (lblock + len <= lblock) return 0; - return ext4_data_block_valid(EXT4_SB(inode->i_sb), block, len); + return ext4_inode_block_valid(inode, block, len); }
static int ext4_valid_extent_idx(struct inode *inode, @@ -392,7 +392,7 @@ static int ext4_valid_extent_idx(struct { ext4_fsblk_t block = ext4_idx_pblock(ext_idx);
- return ext4_data_block_valid(EXT4_SB(inode->i_sb), block, 1); + return ext4_inode_block_valid(inode, block, 1); }
static int ext4_valid_extent_entries(struct inode *inode, @@ -549,14 +549,10 @@ __read_extent_tree_block(const char *fun } if (buffer_verified(bh) && !(flags & EXT4_EX_FORCE_CACHE)) return bh; - if (!ext4_has_feature_journal(inode->i_sb) || - (inode->i_ino != - le32_to_cpu(EXT4_SB(inode->i_sb)->s_es->s_journal_inum))) { - err = __ext4_ext_check(function, line, inode, - ext_block_hdr(bh), depth, pblk); - if (err) - goto errout; - } + err = __ext4_ext_check(function, line, inode, + ext_block_hdr(bh), depth, pblk); + if (err) + goto errout; set_buffer_verified(bh); /* * If this is a leaf block, cache all of its entries --- a/fs/ext4/indirect.c +++ b/fs/ext4/indirect.c @@ -946,8 +946,7 @@ static int ext4_clear_blocks(handle_t *h else if (ext4_should_journal_data(inode)) flags |= EXT4_FREE_BLOCKS_FORGET;
- if (!ext4_data_block_valid(EXT4_SB(inode->i_sb), block_to_free, - count)) { + if (!ext4_inode_block_valid(inode, block_to_free, count)) { EXT4_ERROR_INODE(inode, "attempt to clear invalid " "blocks %llu len %lu", (unsigned long long) block_to_free, count); @@ -1109,8 +1108,7 @@ static void ext4_free_branches(handle_t if (!nr) continue; /* A hole */
- if (!ext4_data_block_valid(EXT4_SB(inode->i_sb), - nr, 1)) { + if (!ext4_inode_block_valid(inode, nr, 1)) { EXT4_ERROR_INODE(inode, "invalid indirect mapped " "block %lu (level %d)", --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -381,8 +381,7 @@ static int __check_block_validity(struct (inode->i_ino == le32_to_cpu(EXT4_SB(inode->i_sb)->s_es->s_journal_inum))) return 0; - if (!ext4_data_block_valid(EXT4_SB(inode->i_sb), map->m_pblk, - map->m_len)) { + if (!ext4_inode_block_valid(inode, map->m_pblk, map->m_len)) { ext4_error_inode(inode, func, line, map->m_pblk, "lblock %lu mapped to illegal pblock %llu " "(length %d)", (unsigned long) map->m_lblk, @@ -4437,7 +4436,7 @@ struct inode *__ext4_iget(struct super_b
ret = 0; if (ei->i_file_acl && - !ext4_data_block_valid(EXT4_SB(sb), ei->i_file_acl, 1)) { + !ext4_inode_block_valid(inode, ei->i_file_acl, 1)) { ext4_error_inode(inode, function, line, 0, "iget: bad extended attribute block %llu", ei->i_file_acl); --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -2960,7 +2960,7 @@ ext4_mb_mark_diskspace_used(struct ext4_ block = ext4_grp_offs_to_block(sb, &ac->ac_b_ex);
len = EXT4_C2B(sbi, ac->ac_b_ex.fe_len); - if (!ext4_data_block_valid(sbi, block, len)) { + if (!ext4_inode_block_valid(ac->ac_inode, block, len)) { ext4_error(sb, "Allocating blocks %llu-%llu which overlap " "fs metadata", block, block+len); /* File system mounted not to panic on error @@ -4718,7 +4718,7 @@ void ext4_free_blocks(handle_t *handle,
sbi = EXT4_SB(sb); if (!(flags & EXT4_FREE_BLOCKS_VALIDATED) && - !ext4_data_block_valid(sbi, block, count)) { + !ext4_inode_block_valid(inode, block, count)) { ext4_error(sb, "Freeing blocks not in datazone - " "block = %llu, count = %lu", block, count); goto error_return;
From: Gwendal Grignou gwendal@chromium.org
commit 5d749d0bbe811c10d9048cde6dfebc761713abfd upstream.
Prevent memory scribble by checking that ioctl buffer size parameters are sane. Without this check, on 32 bits system, if .insize = 0xffffffff - 20 and .outsize the amount to scribble, we would overflow, allocate a small amounts and be able to write outside of the malloc'ed area. Adding a hard limit allows argument checking of the ioctl. With the current EC, it is expected .insize and .outsize to be at around 512 bytes or less.
Signed-off-by: Gwendal Grignou gwendal@chromium.org Signed-off-by: Olof Johansson olof@lixom.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/platform/chrome/cros_ec_dev.c | 4 ++++ drivers/platform/chrome/cros_ec_proto.c | 4 ++-- include/linux/mfd/cros_ec.h | 6 ++++-- 3 files changed, 10 insertions(+), 4 deletions(-)
--- a/drivers/platform/chrome/cros_ec_dev.c +++ b/drivers/platform/chrome/cros_ec_dev.c @@ -137,6 +137,10 @@ static long ec_device_ioctl_xcmd(struct if (copy_from_user(&u_cmd, arg, sizeof(u_cmd))) return -EFAULT;
+ if ((u_cmd.outsize > EC_MAX_MSG_BYTES) || + (u_cmd.insize > EC_MAX_MSG_BYTES)) + return -EINVAL; + s_cmd = kmalloc(sizeof(*s_cmd) + max(u_cmd.outsize, u_cmd.insize), GFP_KERNEL); if (!s_cmd) --- a/drivers/platform/chrome/cros_ec_proto.c +++ b/drivers/platform/chrome/cros_ec_proto.c @@ -311,8 +311,8 @@ int cros_ec_query_all(struct cros_ec_dev ec_dev->max_response = EC_PROTO2_MAX_PARAM_SIZE; ec_dev->max_passthru = 0; ec_dev->pkt_xfer = NULL; - ec_dev->din_size = EC_MSG_BYTES; - ec_dev->dout_size = EC_MSG_BYTES; + ec_dev->din_size = EC_PROTO2_MSG_BYTES; + ec_dev->dout_size = EC_PROTO2_MSG_BYTES; } else { /* * It's possible for a test to occur too early when --- a/include/linux/mfd/cros_ec.h +++ b/include/linux/mfd/cros_ec.h @@ -50,9 +50,11 @@ enum { EC_MSG_TX_TRAILER_BYTES, EC_MSG_RX_PROTO_BYTES = 3,
- /* Max length of messages */ - EC_MSG_BYTES = EC_PROTO2_MAX_PARAM_SIZE + + /* Max length of messages for proto 2*/ + EC_PROTO2_MSG_BYTES = EC_PROTO2_MAX_PARAM_SIZE + EC_MSG_TX_PROTO_BYTES, + + EC_MAX_MSG_BYTES = 64 * 1024, };
/*
From: Filipe Manana fdmanana@suse.com
commit dbcc7d57bffc0c8cac9dac11bec548597d59a6a5 upstream.
While resolving backreferences, as part of a logical ino ioctl call or fiemap, we can end up hitting a BUG_ON() when replaying tree mod log operations of a root, triggering a stack trace like the following:
------------[ cut here ]------------ kernel BUG at fs/btrfs/ctree.c:1210! invalid opcode: 0000 [#1] SMP KASAN PTI CPU: 1 PID: 19054 Comm: crawl_335 Tainted: G W 5.11.0-2d11c0084b02-misc-next+ #89 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014 RIP: 0010:__tree_mod_log_rewind+0x3b1/0x3c0 Code: 05 48 8d 74 10 (...) RSP: 0018:ffffc90001eb70b8 EFLAGS: 00010297 RAX: 0000000000000000 RBX: ffff88812344e400 RCX: ffffffffb28933b6 RDX: 0000000000000007 RSI: dffffc0000000000 RDI: ffff88812344e42c RBP: ffffc90001eb7108 R08: 1ffff11020b60a20 R09: ffffed1020b60a20 R10: ffff888105b050f9 R11: ffffed1020b60a1f R12: 00000000000000ee R13: ffff8880195520c0 R14: ffff8881bc958500 R15: ffff88812344e42c FS: 00007fd1955e8700(0000) GS:ffff8881f5600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007efdb7928718 CR3: 000000010103a006 CR4: 0000000000170ee0 Call Trace: btrfs_search_old_slot+0x265/0x10d0 ? lock_acquired+0xbb/0x600 ? btrfs_search_slot+0x1090/0x1090 ? free_extent_buffer.part.61+0xd7/0x140 ? free_extent_buffer+0x13/0x20 resolve_indirect_refs+0x3e9/0xfc0 ? lock_downgrade+0x3d0/0x3d0 ? __kasan_check_read+0x11/0x20 ? add_prelim_ref.part.11+0x150/0x150 ? lock_downgrade+0x3d0/0x3d0 ? __kasan_check_read+0x11/0x20 ? lock_acquired+0xbb/0x600 ? __kasan_check_write+0x14/0x20 ? do_raw_spin_unlock+0xa8/0x140 ? rb_insert_color+0x30/0x360 ? prelim_ref_insert+0x12d/0x430 find_parent_nodes+0x5c3/0x1830 ? resolve_indirect_refs+0xfc0/0xfc0 ? lock_release+0xc8/0x620 ? fs_reclaim_acquire+0x67/0xf0 ? lock_acquire+0xc7/0x510 ? lock_downgrade+0x3d0/0x3d0 ? lockdep_hardirqs_on_prepare+0x160/0x210 ? lock_release+0xc8/0x620 ? fs_reclaim_acquire+0x67/0xf0 ? lock_acquire+0xc7/0x510 ? poison_range+0x38/0x40 ? unpoison_range+0x14/0x40 ? trace_hardirqs_on+0x55/0x120 btrfs_find_all_roots_safe+0x142/0x1e0 ? find_parent_nodes+0x1830/0x1830 ? btrfs_inode_flags_to_xflags+0x50/0x50 iterate_extent_inodes+0x20e/0x580 ? tree_backref_for_extent+0x230/0x230 ? lock_downgrade+0x3d0/0x3d0 ? read_extent_buffer+0xdd/0x110 ? lock_downgrade+0x3d0/0x3d0 ? __kasan_check_read+0x11/0x20 ? lock_acquired+0xbb/0x600 ? __kasan_check_write+0x14/0x20 ? _raw_spin_unlock+0x22/0x30 ? __kasan_check_write+0x14/0x20 iterate_inodes_from_logical+0x129/0x170 ? iterate_inodes_from_logical+0x129/0x170 ? btrfs_inode_flags_to_xflags+0x50/0x50 ? iterate_extent_inodes+0x580/0x580 ? __vmalloc_node+0x92/0xb0 ? init_data_container+0x34/0xb0 ? init_data_container+0x34/0xb0 ? kvmalloc_node+0x60/0x80 btrfs_ioctl_logical_to_ino+0x158/0x230 btrfs_ioctl+0x205e/0x4040 ? __might_sleep+0x71/0xe0 ? btrfs_ioctl_get_supported_features+0x30/0x30 ? getrusage+0x4b6/0x9c0 ? __kasan_check_read+0x11/0x20 ? lock_release+0xc8/0x620 ? __might_fault+0x64/0xd0 ? lock_acquire+0xc7/0x510 ? lock_downgrade+0x3d0/0x3d0 ? lockdep_hardirqs_on_prepare+0x210/0x210 ? lockdep_hardirqs_on_prepare+0x210/0x210 ? __kasan_check_read+0x11/0x20 ? do_vfs_ioctl+0xfc/0x9d0 ? ioctl_file_clone+0xe0/0xe0 ? lock_downgrade+0x3d0/0x3d0 ? lockdep_hardirqs_on_prepare+0x210/0x210 ? __kasan_check_read+0x11/0x20 ? lock_release+0xc8/0x620 ? __task_pid_nr_ns+0xd3/0x250 ? lock_acquire+0xc7/0x510 ? __fget_files+0x160/0x230 ? __fget_light+0xf2/0x110 __x64_sys_ioctl+0xc3/0x100 do_syscall_64+0x37/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7fd1976e2427 Code: 00 00 90 48 8b 05 (...) RSP: 002b:00007fd1955e5cf8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 00007fd1955e5f40 RCX: 00007fd1976e2427 RDX: 00007fd1955e5f48 RSI: 00000000c038943b RDI: 0000000000000004 RBP: 0000000001000000 R08: 0000000000000000 R09: 00007fd1955e6120 R10: 0000557835366b00 R11: 0000000000000246 R12: 0000000000000004 R13: 00007fd1955e5f48 R14: 00007fd1955e5f40 R15: 00007fd1955e5ef8 Modules linked in: ---[ end trace ec8931a1c36e57be ]---
(gdb) l *(__tree_mod_log_rewind+0x3b1) 0xffffffff81893521 is in __tree_mod_log_rewind (fs/btrfs/ctree.c:1210). 1205 * the modification. as we're going backwards, we do the 1206 * opposite of each operation here. 1207 */ 1208 switch (tm->op) { 1209 case MOD_LOG_KEY_REMOVE_WHILE_FREEING: 1210 BUG_ON(tm->slot < n); 1211 fallthrough; 1212 case MOD_LOG_KEY_REMOVE_WHILE_MOVING: 1213 case MOD_LOG_KEY_REMOVE: 1214 btrfs_set_node_key(eb, &tm->key, tm->slot);
Here's what happens to hit that BUG_ON():
1) We have one tree mod log user (through fiemap or the logical ino ioctl), with a sequence number of 1, so we have fs_info->tree_mod_seq == 1;
2) Another task is at ctree.c:balance_level() and we have eb X currently as the root of the tree, and we promote its single child, eb Y, as the new root.
Then, at ctree.c:balance_level(), we call:
tree_mod_log_insert_root(eb X, eb Y, 1);
3) At tree_mod_log_insert_root() we create tree mod log elements for each slot of eb X, of operation type MOD_LOG_KEY_REMOVE_WHILE_FREEING each with a ->logical pointing to ebX->start. These are placed in an array named tm_list. Lets assume there are N elements (N pointers in eb X);
4) Then, still at tree_mod_log_insert_root(), we create a tree mod log element of operation type MOD_LOG_ROOT_REPLACE, ->logical set to ebY->start, ->old_root.logical set to ebX->start, ->old_root.level set to the level of eb X and ->generation set to the generation of eb X;
5) Then tree_mod_log_insert_root() calls tree_mod_log_free_eb() with tm_list as argument. After that, tree_mod_log_free_eb() calls __tree_mod_log_insert() for each member of tm_list in reverse order, from highest slot in eb X, slot N - 1, to slot 0 of eb X;
6) __tree_mod_log_insert() sets the sequence number of each given tree mod log operation - it increments fs_info->tree_mod_seq and sets fs_info->tree_mod_seq as the sequence number of the given tree mod log operation.
This means that for the tm_list created at tree_mod_log_insert_root(), the element corresponding to slot 0 of eb X has the highest sequence number (1 + N), and the element corresponding to the last slot has the lowest sequence number (2);
7) Then, after inserting tm_list's elements into the tree mod log rbtree, the MOD_LOG_ROOT_REPLACE element is inserted, which gets the highest sequence number, which is N + 2;
8) Back to ctree.c:balance_level(), we free eb X by calling btrfs_free_tree_block() on it. Because eb X was created in the current transaction, has no other references and writeback did not happen for it, we add it back to the free space cache/tree;
9) Later some other task T allocates the metadata extent from eb X, since it is marked as free space in the space cache/tree, and uses it as a node for some other btree;
10) The tree mod log user task calls btrfs_search_old_slot(), which calls get_old_root(), and finally that calls __tree_mod_log_oldest_root() with time_seq == 1 and eb_root == eb Y;
11) First iteration of the while loop finds the tree mod log element with sequence number N + 2, for the logical address of eb Y and of type MOD_LOG_ROOT_REPLACE;
12) Because the operation type is MOD_LOG_ROOT_REPLACE, we don't break out of the loop, and set root_logical to point to tm->old_root.logical which corresponds to the logical address of eb X;
13) On the next iteration of the while loop, the call to tree_mod_log_search_oldest() returns the smallest tree mod log element for the logical address of eb X, which has a sequence number of 2, an operation type of MOD_LOG_KEY_REMOVE_WHILE_FREEING and corresponds to the old slot N - 1 of eb X (eb X had N items in it before being freed);
14) We then break out of the while loop and return the tree mod log operation of type MOD_LOG_ROOT_REPLACE (eb Y), and not the one for slot N - 1 of eb X, to get_old_root();
15) At get_old_root(), we process the MOD_LOG_ROOT_REPLACE operation and set "logical" to the logical address of eb X, which was the old root. We then call tree_mod_log_search() passing it the logical address of eb X and time_seq == 1;
16) Then before calling tree_mod_log_search(), task T adds a key to eb X, which results in adding a tree mod log operation of type MOD_LOG_KEY_ADD to the tree mod log - this is done at ctree.c:insert_ptr() - but after adding the tree mod log operation and before updating the number of items in eb X from 0 to 1...
17) The task at get_old_root() calls tree_mod_log_search() and gets the tree mod log operation of type MOD_LOG_KEY_ADD just added by task T. Then it enters the following if branch:
if (old_root && tm && tm->op != MOD_LOG_KEY_REMOVE_WHILE_FREEING) { (...) } (...)
Calls read_tree_block() for eb X, which gets a reference on eb X but does not lock it - task T has it locked. Then it clones eb X while it has nritems set to 0 in its header, before task T sets nritems to 1 in eb X's header. From hereupon we use the clone of eb X which no other task has access to;
18) Then we call __tree_mod_log_rewind(), passing it the MOD_LOG_KEY_ADD mod log operation we just got from tree_mod_log_search() in the previous step and the cloned version of eb X;
19) At __tree_mod_log_rewind(), we set the local variable "n" to the number of items set in eb X's clone, which is 0. Then we enter the while loop, and in its first iteration we process the MOD_LOG_KEY_ADD operation, which just decrements "n" from 0 to (u32)-1, since "n" is declared with a type of u32. At the end of this iteration we call rb_next() to find the next tree mod log operation for eb X, that gives us the mod log operation of type MOD_LOG_KEY_REMOVE_WHILE_FREEING, for slot 0, with a sequence number of N + 1 (steps 3 to 6);
20) Then we go back to the top of the while loop and trigger the following BUG_ON():
(...) switch (tm->op) { case MOD_LOG_KEY_REMOVE_WHILE_FREEING: BUG_ON(tm->slot < n); fallthrough; (...)
Because "n" has a value of (u32)-1 (4294967295) and tm->slot is 0.
Fix this by taking a read lock on the extent buffer before cloning it at ctree.c:get_old_root(). This should be done regardless of the extent buffer having been freed and reused, as a concurrent task might be modifying it (while holding a write lock on it).
Reported-by: Zygo Blaxell ce3g8jdj@umail.furryterror.org Link: https://lore.kernel.org/linux-btrfs/20210227155037.GN28049@hungrycats.org/ Fixes: 834328a8493079 ("Btrfs: tree mod log's old roots could still be part of the tree") CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Filipe Manana fdmanana@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/ctree.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/fs/btrfs/ctree.c +++ b/fs/btrfs/ctree.c @@ -1431,7 +1431,9 @@ get_old_root(struct btrfs_root *root, u6 btrfs_warn(root->fs_info, "failed to read tree block %llu from get_old_root", logical); } else { + btrfs_tree_read_lock(old); eb = btrfs_clone_extent_buffer(old); + btrfs_tree_read_unlock(old); free_extent_buffer(old); } } else if (old_root) {
From: Joe Korty joe.korty@concurrent-rt.com
commit c7de87ff9dac5f396f62d584f3908f80ddc0e07b upstream.
[ This problem is in mainline, but only rt has the chops to be able to detect it. ]
Lockdep reports a circular lock dependency between serv->sv_lock and softirq_ctl.lock on system shutdown, when using a kernel built with CONFIG_PREEMPT_RT=y, and a nfs mount exists.
This is due to the definition of spin_lock_bh on rt:
local_bh_disable(); rt_spin_lock(lock);
which forces a softirq_ctl.lock -> serv->sv_lock dependency. This is not a problem as long as _every_ lock of serv->sv_lock is a:
spin_lock_bh(&serv->sv_lock);
but there is one of the form:
spin_lock(&serv->sv_lock);
This is what is causing the circular dependency splat. The spin_lock() grabs the lock without first grabbing softirq_ctl.lock via local_bh_disable. If later on in the critical region, someone does a local_bh_disable, we get a serv->sv_lock -> softirq_ctrl.lock dependency established. Deadlock.
Fix is to make serv->sv_lock be locked with spin_lock_bh everywhere, no exceptions.
[ OK ] Stopped target NFS client services. Stopping Logout off all iSCSI sessions on shutdown... Stopping NFS server and services... [ 109.442380] [ 109.442385] ====================================================== [ 109.442386] WARNING: possible circular locking dependency detected [ 109.442387] 5.10.16-rt30 #1 Not tainted [ 109.442389] ------------------------------------------------------ [ 109.442390] nfsd/1032 is trying to acquire lock: [ 109.442392] ffff994237617f60 ((softirq_ctrl.lock).lock){+.+.}-{2:2}, at: __local_bh_disable_ip+0xd9/0x270 [ 109.442405] [ 109.442405] but task is already holding lock: [ 109.442406] ffff994245cb00b0 (&serv->sv_lock){+.+.}-{0:0}, at: svc_close_list+0x1f/0x90 [ 109.442415] [ 109.442415] which lock already depends on the new lock. [ 109.442415] [ 109.442416] [ 109.442416] the existing dependency chain (in reverse order) is: [ 109.442417] [ 109.442417] -> #1 (&serv->sv_lock){+.+.}-{0:0}: [ 109.442421] rt_spin_lock+0x2b/0xc0 [ 109.442428] svc_add_new_perm_xprt+0x42/0xa0 [ 109.442430] svc_addsock+0x135/0x220 [ 109.442434] write_ports+0x4b3/0x620 [ 109.442438] nfsctl_transaction_write+0x45/0x80 [ 109.442440] vfs_write+0xff/0x420 [ 109.442444] ksys_write+0x4f/0xc0 [ 109.442446] do_syscall_64+0x33/0x40 [ 109.442450] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 109.442454] [ 109.442454] -> #0 ((softirq_ctrl.lock).lock){+.+.}-{2:2}: [ 109.442457] __lock_acquire+0x1264/0x20b0 [ 109.442463] lock_acquire+0xc2/0x400 [ 109.442466] rt_spin_lock+0x2b/0xc0 [ 109.442469] __local_bh_disable_ip+0xd9/0x270 [ 109.442471] svc_xprt_do_enqueue+0xc0/0x4d0 [ 109.442474] svc_close_list+0x60/0x90 [ 109.442476] svc_close_net+0x49/0x1a0 [ 109.442478] svc_shutdown_net+0x12/0x40 [ 109.442480] nfsd_destroy+0xc5/0x180 [ 109.442482] nfsd+0x1bc/0x270 [ 109.442483] kthread+0x194/0x1b0 [ 109.442487] ret_from_fork+0x22/0x30 [ 109.442492] [ 109.442492] other info that might help us debug this: [ 109.442492] [ 109.442493] Possible unsafe locking scenario: [ 109.442493] [ 109.442493] CPU0 CPU1 [ 109.442494] ---- ---- [ 109.442495] lock(&serv->sv_lock); [ 109.442496] lock((softirq_ctrl.lock).lock); [ 109.442498] lock(&serv->sv_lock); [ 109.442499] lock((softirq_ctrl.lock).lock); [ 109.442501] [ 109.442501] *** DEADLOCK *** [ 109.442501] [ 109.442501] 3 locks held by nfsd/1032: [ 109.442503] #0: ffffffff93b49258 (nfsd_mutex){+.+.}-{3:3}, at: nfsd+0x19a/0x270 [ 109.442508] #1: ffff994245cb00b0 (&serv->sv_lock){+.+.}-{0:0}, at: svc_close_list+0x1f/0x90 [ 109.442512] #2: ffffffff93a81b20 (rcu_read_lock){....}-{1:2}, at: rt_spin_lock+0x5/0xc0 [ 109.442518] [ 109.442518] stack backtrace: [ 109.442519] CPU: 0 PID: 1032 Comm: nfsd Not tainted 5.10.16-rt30 #1 [ 109.442522] Hardware name: Supermicro X9DRL-3F/iF/X9DRL-3F/iF, BIOS 3.2 09/22/2015 [ 109.442524] Call Trace: [ 109.442527] dump_stack+0x77/0x97 [ 109.442533] check_noncircular+0xdc/0xf0 [ 109.442546] __lock_acquire+0x1264/0x20b0 [ 109.442553] lock_acquire+0xc2/0x400 [ 109.442564] rt_spin_lock+0x2b/0xc0 [ 109.442570] __local_bh_disable_ip+0xd9/0x270 [ 109.442573] svc_xprt_do_enqueue+0xc0/0x4d0 [ 109.442577] svc_close_list+0x60/0x90 [ 109.442581] svc_close_net+0x49/0x1a0 [ 109.442585] svc_shutdown_net+0x12/0x40 [ 109.442588] nfsd_destroy+0xc5/0x180 [ 109.442590] nfsd+0x1bc/0x270 [ 109.442595] kthread+0x194/0x1b0 [ 109.442600] ret_from_fork+0x22/0x30 [ 109.518225] nfsd: last server has exited, flushing export cache [ OK ] Stopped NFSv4 ID-name mapping service. [ OK ] Stopped GSSAPI Proxy Daemon. [ OK ] Stopped NFS Mount Daemon. [ OK ] Stopped NFS status monitor for NFSv2/3 locking..
Fixes: 719f8bcc883e ("svcrpc: fix xpt_list traversal locking on shutdown") Signed-off-by: Joe Korty joe.korty@concurrent-rt.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/sunrpc/svc_xprt.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/net/sunrpc/svc_xprt.c +++ b/net/sunrpc/svc_xprt.c @@ -1011,7 +1011,7 @@ static int svc_close_list(struct svc_ser struct svc_xprt *xprt; int ret = 0;
- spin_lock(&serv->sv_lock); + spin_lock_bh(&serv->sv_lock); list_for_each_entry(xprt, xprt_list, xpt_list) { if (xprt->xpt_net != net) continue; @@ -1019,7 +1019,7 @@ static int svc_close_list(struct svc_ser set_bit(XPT_CLOSE, &xprt->xpt_flags); svc_xprt_enqueue(xprt); } - spin_unlock(&serv->sv_lock); + spin_unlock_bh(&serv->sv_lock); return ret; }
From: Dan Carpenter dan.carpenter@oracle.com
commit 19f1bc7edf0f97186810e13a88f5b62069d89097 upstream.
If copy_from_user() or kstrtoull() fail then the correct behavior is to return a negative error code.
Link: https://lore.kernel.org/r/YEsbU/UxYypVrC7/@mwanda Fixes: f9bb2da11db8 ("[SCSI] lpfc 8.3.27: T10 additions for SLI4") Signed-off-by: Dan Carpenter dan.carpenter@oracle.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/scsi/lpfc/lpfc_debugfs.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/scsi/lpfc/lpfc_debugfs.c +++ b/drivers/scsi/lpfc/lpfc_debugfs.c @@ -1061,7 +1061,7 @@ lpfc_debugfs_dif_err_write(struct file * memset(dstbuf, 0, 32); size = (nbytes < 32) ? nbytes : 32; if (copy_from_user(dstbuf, buf, size)) - return 0; + return -EFAULT;
if (dent == phba->debug_InjErrLBA) { if ((buf[0] == 'o') && (buf[1] == 'f') && (buf[2] == 'f')) @@ -1069,7 +1069,7 @@ lpfc_debugfs_dif_err_write(struct file * }
if ((tmp == 0) && (kstrtoull(dstbuf, 0, &tmp))) - return 0; + return -EINVAL;
if (dent == phba->debug_writeGuard) phba->lpfc_injerr_wgrd_cnt = (uint32_t)tmp;
From: Macpaul Lin macpaul.lin@mediatek.com
commit 81c7462883b0cc0a4eeef0687f80ad5b5baee5f6 upstream.
Replace hardcoded maximum USB string length (126 bytes) by definition "USB_MAX_STRING_LEN".
Signed-off-by: Macpaul Lin macpaul.lin@mediatek.com Acked-by: Alan Stern stern@rowland.harvard.edu Link: https://lore.kernel.org/r/1592471618-29428-1-git-send-email-macpaul.lin@medi... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/gadget/composite.c | 4 ++-- drivers/usb/gadget/configfs.c | 2 +- drivers/usb/gadget/usbstring.c | 4 ++-- include/uapi/linux/usb/ch9.h | 3 +++ 4 files changed, 8 insertions(+), 5 deletions(-)
--- a/drivers/usb/gadget/composite.c +++ b/drivers/usb/gadget/composite.c @@ -933,7 +933,7 @@ static void collect_langs(struct usb_gad while (*sp) { s = *sp; language = cpu_to_le16(s->language); - for (tmp = buf; *tmp && tmp < &buf[126]; tmp++) { + for (tmp = buf; *tmp && tmp < &buf[USB_MAX_STRING_LEN]; tmp++) { if (*tmp == language) goto repeat; } @@ -1008,7 +1008,7 @@ static int get_string(struct usb_composi collect_langs(sp, s->wData); }
- for (len = 0; len <= 126 && s->wData[len]; len++) + for (len = 0; len <= USB_MAX_STRING_LEN && s->wData[len]; len++) continue; if (!len) return -EINVAL; --- a/drivers/usb/gadget/configfs.c +++ b/drivers/usb/gadget/configfs.c @@ -117,7 +117,7 @@ static int usb_string_copy(const char *s char *str; char *copy = *s_copy; ret = strlen(s); - if (ret > 126) + if (ret > USB_MAX_STRING_LEN) return -EOVERFLOW;
str = kstrdup(s, GFP_KERNEL); --- a/drivers/usb/gadget/usbstring.c +++ b/drivers/usb/gadget/usbstring.c @@ -59,9 +59,9 @@ usb_gadget_get_string (struct usb_gadget return -EINVAL;
/* string descriptors have length, tag, then UTF16-LE text */ - len = min ((size_t) 126, strlen (s->s)); + len = min((size_t)USB_MAX_STRING_LEN, strlen(s->s)); len = utf8s_to_utf16s(s->s, len, UTF16_LITTLE_ENDIAN, - (wchar_t *) &buf[2], 126); + (wchar_t *) &buf[2], USB_MAX_STRING_LEN); if (len < 0) return -EINVAL; buf [0] = (len + 1) * 2; --- a/include/uapi/linux/usb/ch9.h +++ b/include/uapi/linux/usb/ch9.h @@ -333,6 +333,9 @@ struct usb_config_descriptor {
/*-------------------------------------------------------------------------*/
+/* USB String descriptors can contain at most 126 characters. */ +#define USB_MAX_STRING_LEN 126 + /* USB_DT_STRING: String descriptor */ struct usb_string_descriptor { __u8 bLength;
From: Jim Lin jilin@nvidia.com
commit 98f153a10da403ddd5e9d98a3c8c2bb54bb5a0b6 upstream.
When gadget is disconnected, running sequence is like this. . composite_disconnect . Call trace: usb_string_copy+0xd0/0x128 gadget_config_name_configuration_store+0x4 gadget_config_name_attr_store+0x40/0x50 configfs_write_file+0x198/0x1f4 vfs_write+0x100/0x220 SyS_write+0x58/0xa8 . configfs_composite_unbind . configfs_composite_bind
In configfs_composite_bind, it has "cn->strings.s = cn->configuration;"
When usb_string_copy is invoked. it would allocate memory, copy input string, release previous pointed memory space, and use new allocated memory.
When gadget is connected, host sends down request to get information. Call trace: usb_gadget_get_string+0xec/0x168 lookup_string+0x64/0x98 composite_setup+0xa34/0x1ee8
If gadget is disconnected and connected quickly, in the failed case, cn->configuration memory has been released by usb_string_copy kfree but configfs_composite_bind hasn't been run in time to assign new allocated "cn->configuration" pointer to "cn->strings.s".
When "strlen(s->s) of usb_gadget_get_string is being executed, the dangling memory is accessed, "BUG: KASAN: use-after-free" error occurs.
Cc: stable@vger.kernel.org Signed-off-by: Jim Lin jilin@nvidia.com Signed-off-by: Macpaul Lin macpaul.lin@mediatek.com Link: https://lore.kernel.org/r/1615444961-13376-1-git-send-email-macpaul.lin@medi... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/gadget/configfs.c | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-)
--- a/drivers/usb/gadget/configfs.c +++ b/drivers/usb/gadget/configfs.c @@ -111,6 +111,8 @@ struct gadget_config_name { struct list_head list; };
+#define USB_MAX_STRING_WITH_NULL_LEN (USB_MAX_STRING_LEN+1) + static int usb_string_copy(const char *s, char **s_copy) { int ret; @@ -120,12 +122,16 @@ static int usb_string_copy(const char *s if (ret > USB_MAX_STRING_LEN) return -EOVERFLOW;
- str = kstrdup(s, GFP_KERNEL); - if (!str) - return -ENOMEM; + if (copy) { + str = copy; + } else { + str = kmalloc(USB_MAX_STRING_WITH_NULL_LEN, GFP_KERNEL); + if (!str) + return -ENOMEM; + } + strcpy(str, s); if (str[ret - 1] == '\n') str[ret - 1] = '\0'; - kfree(copy); *s_copy = str; return 0; }
From: Tyrel Datwyler tyreld@linux.ibm.com
commit cc7a0bb058b85ea03db87169c60c7cfdd5d34678 upstream.
Both add_slot_store() and remove_slot_store() try to fix up the drc_name copied from the store buffer by placing a NUL terminator at nbyte + 1 or in place of a '\n' if present. However, the static buffer that we copy the drc_name data into is not zeroed and can contain anything past the n-th byte.
This is problematic if a '\n' byte appears in that buffer after nbytes and the string copied into the store buffer was not NUL terminated to start with as the strchr() search for a '\n' byte will mark this incorrectly as the end of the drc_name string resulting in a drc_name string that contains garbage data after the n-th byte.
Additionally it will cause us to overwrite that '\n' byte on the stack with NUL, potentially corrupting data on the stack.
The following debugging shows an example of the drmgr utility writing "PHB 4543" to the add_slot sysfs attribute, but add_slot_store() logging a corrupted string value.
drmgr: drmgr: -c phb -a -s PHB 4543 -d 1 add_slot_store: drc_name = PHB 4543°|<82>!, rc = -19
Fix this by using strscpy() instead of memcpy() to ensure the string is NUL terminated when copied into the static drc_name buffer. Further, since the string is now NUL terminated the code only needs to change '\n' to '\0' when present.
Cc: stable@vger.kernel.org Signed-off-by: Tyrel Datwyler tyreld@linux.ibm.com [mpe: Reformat change log and add mention of possible stack corruption] Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://lore.kernel.org/r/20210315214821.452959-1-tyreld@linux.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/pci/hotplug/rpadlpar_sysfs.c | 14 ++++++-------- 1 file changed, 6 insertions(+), 8 deletions(-)
--- a/drivers/pci/hotplug/rpadlpar_sysfs.c +++ b/drivers/pci/hotplug/rpadlpar_sysfs.c @@ -39,12 +39,11 @@ static ssize_t add_slot_store(struct kob if (nbytes >= MAX_DRC_NAME_LEN) return 0;
- memcpy(drc_name, buf, nbytes); + strscpy(drc_name, buf, nbytes + 1);
end = strchr(drc_name, '\n'); - if (!end) - end = &drc_name[nbytes]; - *end = '\0'; + if (end) + *end = '\0';
rc = dlpar_add_slot(drc_name); if (rc) @@ -70,12 +69,11 @@ static ssize_t remove_slot_store(struct if (nbytes >= MAX_DRC_NAME_LEN) return 0;
- memcpy(drc_name, buf, nbytes); + strscpy(drc_name, buf, nbytes + 1);
end = strchr(drc_name, '\n'); - if (!end) - end = &drc_name[nbytes]; - *end = '\0'; + if (end) + *end = '\0';
rc = dlpar_remove_slot(drc_name); if (rc)
From: Thomas Gleixner tglx@linutronix.de
commit a501b048a95b79e1e34f03cac3c87ff1e9f229ad upstream.
Vitaly ran into an issue with hotplugging CPU0 on an Amazon instance where the matrix allocator claimed to be out of vectors. He analyzed it down to the point that IRQ2, the PIC cascade interrupt, which is supposed to be not ever routed to the IO/APIC ended up having an interrupt vector assigned which got moved during unplug of CPU0.
The underlying issue is that IRQ2 for various reasons (see commit af174783b925 ("x86: I/O APIC: Never configure IRQ2" for details) is treated as a reserved system vector by the vector core code and is not accounted as a regular vector. The Amazon BIOS has an routing entry of pin2 to IRQ2 which causes the IO/APIC setup to claim that interrupt which is granted by the vector domain because there is no sanity check. As a consequence the allocation counter of CPU0 underflows which causes a subsequent unplug to fail with:
[ ... ] CPU 0 has 4294967295 vectors, 589 available. Cannot disable CPU
There is another sanity check missing in the matrix allocator, but the underlying root cause is that the IO/APIC code lost the IRQ2 ignore logic during the conversion to irqdomains.
For almost 6 years nobody complained about this wreckage, which might indicate that this requirement could be lifted, but for any system which actually has a PIC IRQ2 is unusable by design so any routing entry has no effect and the interrupt cannot be connected to a device anyway.
Due to that and due to history biased paranoia reasons restore the IRQ2 ignore logic and treat it as non existent despite a routing entry claiming otherwise.
Fixes: d32932d02e18 ("x86/irq: Convert IOAPIC to use hierarchical irqdomain interfaces") Reported-by: Vitaly Kuznetsov vkuznets@redhat.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Tested-by: Vitaly Kuznetsov vkuznets@redhat.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210318192819.636943062@linutronix.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kernel/apic/io_apic.c | 10 ++++++++++ 1 file changed, 10 insertions(+)
--- a/arch/x86/kernel/apic/io_apic.c +++ b/arch/x86/kernel/apic/io_apic.c @@ -1040,6 +1040,16 @@ static int mp_map_pin_to_irq(u32 gsi, in if (idx >= 0 && test_bit(mp_irqs[idx].srcbus, mp_bus_not_pci)) { irq = mp_irqs[idx].srcbusirq; legacy = mp_is_legacy_irq(irq); + /* + * IRQ2 is unusable for historical reasons on systems which + * have a legacy PIC. See the comment vs. IRQ2 further down. + * + * If this gets removed at some point then the related code + * in lapic_assign_system_vectors() needs to be adjusted as + * well. + */ + if (legacy && irq == PIC_CASCADE_IR) + return -EINVAL; }
mutex_lock(&ioapic_mutex);
From: zhangyi (F) yi.zhang@huawei.com
commit b7ff91fd030dc9d72ed91b1aab36e445a003af4f upstream.
If we failed to add new entry on rename whiteout, we cannot reset the old->de entry directly, because the old->de could have moved from under us during make indexed dir. So find the old entry again before reset is needed, otherwise it may corrupt the filesystem as below.
/dev/sda: Entry '00000001' in ??? (12) has deleted/unused inode 15. CLEARED. /dev/sda: Unattached inode 75 /dev/sda: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY.
Fixes: 6b4b8e6b4ad ("ext4: fix bug for rename with RENAME_WHITEOUT") Cc: stable@vger.kernel.org Signed-off-by: zhangyi (F) yi.zhang@huawei.com Link: https://lore.kernel.org/r/20210303131703.330415-1-yi.zhang@huawei.com Signed-off-by: Theodore Ts'o tytso@mit.edu Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ext4/namei.c | 29 +++++++++++++++++++++++++++-- 1 file changed, 27 insertions(+), 2 deletions(-)
--- a/fs/ext4/namei.c +++ b/fs/ext4/namei.c @@ -3375,6 +3375,31 @@ static int ext4_setent(handle_t *handle, return 0; }
+static void ext4_resetent(handle_t *handle, struct ext4_renament *ent, + unsigned ino, unsigned file_type) +{ + struct ext4_renament old = *ent; + int retval = 0; + + /* + * old->de could have moved from under us during make indexed dir, + * so the old->de may no longer valid and need to find it again + * before reset old inode info. + */ + old.bh = ext4_find_entry(old.dir, &old.dentry->d_name, &old.de, NULL); + if (IS_ERR(old.bh)) + retval = PTR_ERR(old.bh); + if (!old.bh) + retval = -ENOENT; + if (retval) { + ext4_std_error(old.dir->i_sb, retval); + return; + } + + ext4_setent(handle, &old, ino, file_type); + brelse(old.bh); +} + static int ext4_find_delete_entry(handle_t *handle, struct inode *dir, const struct qstr *d_name) { @@ -3674,8 +3699,8 @@ static int ext4_rename(struct inode *old end_rename: if (whiteout) { if (retval) { - ext4_setent(handle, &old, - old.inode->i_ino, old_file_type); + ext4_resetent(handle, &old, + old.inode->i_ino, old_file_type); drop_nlink(whiteout); } unlock_new_inode(whiteout);
From: Shijie Luo luoshijie1@huawei.com
commit 7d8bd3c76da1d94b85e6c9b7007e20e980bfcfe6 upstream.
If set_large_file = 1 and errors occur in ext4_handle_dirty_metadata(), the error code will be overridden, go to out_brelse to avoid this situation.
Signed-off-by: Shijie Luo luoshijie1@huawei.com Link: https://lore.kernel.org/r/20210312065051.36314-1-luoshijie1@huawei.com Cc: stable@kernel.org Reviewed-by: Jan Kara jack@suse.cz Signed-off-by: Theodore Ts'o tytso@mit.edu Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ext4/inode.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
--- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -4626,7 +4626,7 @@ static int ext4_do_update_inode(handle_t struct ext4_inode_info *ei = EXT4_I(inode); struct buffer_head *bh = iloc->bh; struct super_block *sb = inode->i_sb; - int err = 0, rc, block; + int err = 0, block; int need_datasync = 0, set_large_file = 0; uid_t i_uid; gid_t i_gid; @@ -4726,9 +4726,9 @@ static int ext4_do_update_inode(handle_t bh->b_data);
BUFFER_TRACE(bh, "call ext4_handle_dirty_metadata"); - rc = ext4_handle_dirty_metadata(handle, NULL, bh); - if (!err) - err = rc; + err = ext4_handle_dirty_metadata(handle, NULL, bh); + if (err) + goto out_brelse; ext4_clear_inode_state(inode, EXT4_STATE_NEW); if (set_large_file) { BUFFER_TRACE(EXT4_SB(sb)->s_sbh, "get write access");
From: Thomas Gleixner tglx@linutronix.de
commit 81e2073c175b887398e5bca6c004efa89983f58d upstream.
With interrupt force threading all device interrupt handlers are invoked from kernel threads. Contrary to hard interrupt context the invocation only disables bottom halfs, but not interrupts. This was an oversight back then because any code like this will have an issue:
thread(irq_A) irq_handler(A) spin_lock(&foo->lock);
interrupt(irq_B) irq_handler(B) spin_lock(&foo->lock);
This has been triggered with networking (NAPI vs. hrtimers) and console drivers where printk() happens from an interrupt which interrupted the force threaded handler.
Now people noticed and started to change the spin_lock() in the handler to spin_lock_irqsave() which affects performance or add IRQF_NOTHREAD to the interrupt request which in turn breaks RT.
Fix the root cause and not the symptom and disable interrupts before invoking the force threaded handler which preserves the regular semantics and the usefulness of the interrupt force threading as a general debugging tool.
For not RT this is not changing much, except that during the execution of the threaded handler interrupts are delayed until the handler returns. Vs. scheduling and softirq processing there is no difference.
For RT kernels there is no issue.
Fixes: 8d32a307e4fa ("genirq: Provide forced interrupt threading") Reported-by: Johan Hovold johan@kernel.org Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Johan Hovold johan@kernel.org Acked-by: Sebastian Andrzej Siewior bigeasy@linutronix.de Link: https://lore.kernel.org/r/20210317143859.513307808@linutronix.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/irq/manage.c | 4 ++++ 1 file changed, 4 insertions(+)
--- a/kernel/irq/manage.c +++ b/kernel/irq/manage.c @@ -872,11 +872,15 @@ irq_forced_thread_fn(struct irq_desc *de irqreturn_t ret;
local_bh_disable(); + if (!IS_ENABLED(CONFIG_PREEMPT_RT_BASE)) + local_irq_disable(); ret = action->thread_fn(action->irq, action->dev_id); if (ret == IRQ_HANDLED) atomic_inc(&desc->threads_handled);
irq_finalize_oneshot(desc, action); + if (!IS_ENABLED(CONFIG_PREEMPT_RT_BASE)) + local_irq_enable(); local_bh_enable(); return ret; }
On Mon, 22 Mar 2021 13:28:54 +0100, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.4.263 release. There are 14 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 24 Mar 2021 12:19:09 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.4.263-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.4.y and the diffstat can be found below.
thanks,
greg k-h
All tests passing for Tegra ...
Test results for stable-v4.4: 6 builds: 6 pass, 0 fail 12 boots: 12 pass, 0 fail 27 tests: 27 pass, 0 fail
Linux version: 4.4.263-rc1-g769f344ebed2 Boards tested: tegra124-jetson-tk1, tegra20-ventana, tegra30-cardhu-a04
Tested-by: Jon Hunter jonathanh@nvidia.com
Jon
Hi!
This is the start of the stable review cycle for the 4.4.263 release. There are 14 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
CIP testing did not find any problems here:
https://gitlab.com/cip-project/cip-testing/linux-stable-rc-ci/-/tree/linux-4...
Tested-by: Pavel Machek (CIP) pavel@denx.de
Best regards, Pavel
On Mon, Mar 22, 2021 at 01:28:54PM +0100, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.4.263 release. There are 14 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 24 Mar 2021 12:19:09 +0000. Anything received after that time might be too late.
Build results: total: 163 pass: 163 fail: 0 Qemu test results: total: 329 pass: 329 fail: 0
Tested-by: Guenter Roeck linux@roeck-us.net
Guenter
On Mon, 22 Mar 2021 at 18:24, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 4.4.263 release. There are 14 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 24 Mar 2021 12:19:09 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.4.263-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.4.y and the diffstat can be found below.
thanks,
greg k-h
Results from Linaro’s test farm. No regressions on arm64, arm, x86_64, and i386.
Tested-by: Linux Kernel Functional Testing lkft@linaro.org
Summary ------------------------------------------------------------------------
kernel: 4.4.263-rc1 git repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git git branch: linux-4.4.y git commit: 769f344ebed2d7d9adc324015195b8c3fda886da git describe: v4.4.262-15-g769f344ebed2 Test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-4.4.y/build/v4.4.26...
No regressions (compared to build v4.4.262)
No fixes (compared to build v4.4.262)
Ran 36066 total tests in the following environments and test suites.
Environments -------------- - arm - arm64 - i386 - juno-64k_page_size - juno-r2 - arm64 - juno-r2-compat - juno-r2-kasan - mips - qemu-arm64-kasan - qemu-x86_64-kasan - qemu_arm - qemu_arm64 - qemu_arm64-compat - qemu_i386 - qemu_x86_64 - qemu_x86_64-compat - sparc - x15 - arm - x86_64 - x86-kasan - x86_64
Test Suites ----------- * build * linux-log-parser * libhugetlbfs * ltp-commands-tests * ltp-containers-tests * ltp-math-tests * ltp-nptl-tests * ltp-pty-tests * ltp-securebits-tests * ltp-tracing-tests * network-basic-tests * ltp-cap_bounds-tests * ltp-controllers-tests * ltp-cpuhotplug-tests * ltp-crypto-tests * ltp-cve-tests * ltp-dio-tests * ltp-fcntl-locktests-tests * ltp-filecaps-tests * ltp-fs-tests * ltp-fs_bind-tests * ltp-fs_perms_simple-tests * ltp-fsx-tests * ltp-hugetlb-tests * ltp-io-tests * ltp-ipc-tests * ltp-mm-tests * ltp-sched-tests * ltp-syscalls-tests * kselftest-android * kselftest-bpf * kselftest-capabilities * kselftest-cgroup * kselftest-clone3 * kselftest-core * kselftest-cpu-hotplug * kselftest-cpufreq * kselftest-efivarfs * kselftest-filesystems * kselftest-firmware * kselftest-fpu * kselftest-futex * kselftest-gpio * kselftest-intel_pstate * kselftest-ipc * kselftest-ir * kselftest-kcmp * kselftest-livepatch * kselftest-lkdtm * kselftest-ptrace * kselftest-rseq * kselftest-rtc * kselftest-seccomp * kselftest-sigaltstack * kselftest-size * kselftest-splice * kselftest-static_keys * kselftest-sync * kselftest-sysctl * kselftest-timens * kselftest-timers * kselftest-tmpfs * kselftest-tpm2 * kselftest-user * kselftest-zram * kvm-unit-tests * ltp-open-posix-tests * perf * v4l2-compliance * install-android-platform-tools-r2600 * kselftest-kvm * kselftest-vm * fwts
Summary ------------------------------------------------------------------------
kernel: 4.4.263-rc1 git repo: https://git.linaro.org/lkft/arm64-stable-rc.git git branch: 4.4.263-rc1-hikey-20210322-970 git commit: cdbe5973893696eb29efb23b75953efd2f7edf4d git describe: 4.4.263-rc1-hikey-20210322-970 Test details: https://qa-reports.linaro.org/lkft/linaro-hikey-stable-rc-4.4-oe/build/4.4.2...
No regressions (compared to 4.4.263-rc1-hikey-20210319-968)
No fixes (compared to 4.4.263-rc1-hikey-20210319-968)
Ran 1863 total tests in the following environments and test suites.
Environments -------------- - hi6220-hikey - arm64
Test Suites ----------- * build * install-android-platform-tools-r2600 * kselftest-android * kselftest-bpf * kselftest-capabilities * kselftest-cgroup * kselftest-clone3 * kselftest-core * kselftest-cpu-hotplug * kselftest-cpufreq * kselftest-efivarfs * kselftest-filesystems * kselftest-firmware * kselftest-fpu * kselftest-futex * kselftest-gpio * kselftest-intel_pstate * kselftest-ipc * kselftest-ir * kselftest-kcmp * kselftest-lib * kselftest-livepatch * kselftest-lkdtm * kselftest-membarrier * kselftest-ptrace * kselftest-rseq * kselftest-rtc * kselftest-seccomp * kselftest-sigaltstack * kselftest-size * kselftest-splice * kselftest-static_keys * kselftest-sync * kselftest-sysctl * kselftest-timens * kselftest-timers * kselftest-tmpfs * kselftest-tpm2 * kselftest-user * kselftest-zram * libhugetlbfs * linux-log-parser * ltp-cap_bounds-tests * ltp-commands-tests * ltp-containers-tests * ltp-cpuhotplug-tests * ltp-cve-tests * ltp-dio-tests * ltp-fcntl-locktests-tests * ltp-filecaps-tests * ltp-fs-tests * ltp-fs_bind-tests * ltp-fs_perms_simple-tests * ltp-fsx-tests * ltp-hugetlb-tests * ltp-io-tests * ltp-ipc-tests * ltp-math-tests * ltp-mm-tests * ltp-nptl-tests * ltp-pty-tests * ltp-sched-tests * ltp-securebits-tests * ltp-syscalls-tests * perf * spectre-meltdown-checker-test * v4l2-compliance
linux-stable-mirror@lists.linaro.org