This is the start of the stable review cycle for the 5.4.221 release. There are 53 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sat, 29 Oct 2022 16:50:35 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.4.221-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.4.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 5.4.221-rc1
Seth Jenkins sethjenkins@google.com mm: /proc/pid/smaps_rollup: fix no vma's null-deref
Gaurav Kohli gauravkohli@linux.microsoft.com hv_netvsc: Fix race between VF offering and VF association message from host
Nick Desaulniers ndesaulniers@google.com Makefile.debug: re-enable debug info for .S files
Werner Sembach wse@tuxedocomputers.com ACPI: video: Force backlight native for more TongFang devices
Conor Dooley conor.dooley@microchip.com riscv: topology: fix default topology reporting
Conor Dooley conor.dooley@microchip.com arm64: topology: move store_cpu_topology() to shared code
Jerry Snitselaar jsnitsel@redhat.com iommu/vt-d: Clean up si_domain in the init_dmars() error path
Yang Yingliang yangyingliang@huawei.com net: hns: fix possible memory leak in hnae_ae_register()
Zhengchao Shao shaozhengchao@huawei.com net: sched: cake: fix null pointer access issue when cake_init() fails
Harini Katakam harini.katakam@amd.com net: phy: dp83867: Extend RX strap quirk for SGMII mode
Xiaobo Liu cppcoffee@gmail.com net/atm: fix proc_mpc_write incorrect return value
José Expósito jose.exposito89@gmail.com HID: magicmouse: Do not set BTN_MOUSE on double report
Alexander Potapenko glider@google.com tipc: fix an information leak in tipc_topsrv_kern_subscr
Mark Tomlinson mark.tomlinson@alliedtelesis.co.nz tipc: Fix recognition of trial period
Tony Luck tony.luck@intel.com ACPI: extlog: Handle multiple records
Filipe Manana fdmanana@suse.com btrfs: fix processing of delayed tree block refs during backref walking
Filipe Manana fdmanana@suse.com btrfs: fix processing of delayed data refs during backref walking
Jean-Francois Le Fillatre jflf_kernel@gmx.com r8152: add PID for the Lenovo OneLink+ Dock
James Morse james.morse@arm.com arm64: errata: Remove AES hwcap for COMPAT tasks
Bryan O'Donoghue bryan.odonoghue@linaro.org media: venus: dec: Handle the case where find_format fails
Eric Ren renzhengeek@gmail.com KVM: arm64: vgic: Fix exit condition in scan_its_table()
Kai-Heng Feng kai.heng.feng@canonical.com ata: ahci: Match EM_MAX_SLOTS with SATA_PMP_MAX_PORTS
Alexander Stein alexander.stein@ew.tq-group.com ata: ahci-imx: Fix MODULE_ALIAS
Zhang Rui rui.zhang@intel.com hwmon/coretemp: Handle large core ID value
Borislav Petkov bp@suse.de x86/microcode/AMD: Apply the patch early on every logical thread
Joseph Qi joseph.qi@linux.alibaba.com ocfs2: fix BUG when iput after ocfs2_mknod fails
Joseph Qi joseph.qi@linux.alibaba.com ocfs2: clear dinode links count in case of error
Dave Chinner dchinner@redhat.com xfs: fix use-after-free on CIL context on shutdown
Darrick J. Wong darrick.wong@oracle.com xfs: move inode flush to the sync workqueue
Christoph Hellwig hch@lst.de xfs: reflink should force the log out if mounted with wsync
Christoph Hellwig hch@lst.de xfs: factor out a new xfs_log_force_inode helper
Brian Foster bfoster@redhat.com xfs: trylock underlying buffer on dquot flush
Darrick J. Wong darrick.wong@oracle.com xfs: don't write a corrupt unmount record to force summary counter recalc
Dave Chinner dchinner@redhat.com xfs: tail updates only need to occur when LSN changes
Dave Chinner dchinner@redhat.com xfs: factor common AIL item deletion code
Dave Chinner dchinner@redhat.com xfs: Throttle commits on delayed background CIL push
Dave Chinner dchinner@redhat.com xfs: Lower CIL flush limit for large logs
Darrick J. Wong darrick.wong@oracle.com xfs: preserve default grace interval during quotacheck
Brian Foster bfoster@redhat.com xfs: fix unmount hang and memory leak on shutdown during quotaoff
Brian Foster bfoster@redhat.com xfs: factor out quotaoff intent AIL removal and memory free
Pavel Reichl preichl@redhat.com xfs: Replace function declaration by actual definition
Pavel Reichl preichl@redhat.com xfs: remove the xfs_qoff_logitem_t typedef
Pavel Reichl preichl@redhat.com xfs: remove the xfs_dq_logitem_t typedef
Pavel Reichl preichl@redhat.com xfs: remove the xfs_disk_dquot_t and xfs_dquot_t
Takashi Iwai tiwai@suse.de xfs: Use scnprintf() for avoiding potential buffer overflow
Darrick J. Wong darrick.wong@oracle.com xfs: check owner of dir3 blocks
Darrick J. Wong darrick.wong@oracle.com xfs: check owner of dir3 data blocks
Darrick J. Wong darrick.wong@oracle.com xfs: fix buffer corruption reporting when xfs_dir3_free_header_check fails
Darrick J. Wong darrick.wong@oracle.com xfs: xfs_buf_corruption_error should take __this_address
Darrick J. Wong darrick.wong@oracle.com xfs: add a function to deal with corrupt buffers post-verifiers
Brian Foster bfoster@redhat.com xfs: rework collapse range into an atomic operation
Brian Foster bfoster@redhat.com xfs: rework insert range into an atomic operation
Brian Foster bfoster@redhat.com xfs: open code insert range extent split helper
-------------
Diffstat:
Documentation/arm64/silicon-errata.rst | 4 + Makefile | 8 +- arch/arm64/Kconfig | 16 ++++ arch/arm64/include/asm/cpucaps.h | 3 +- arch/arm64/kernel/cpu_errata.c | 16 ++++ arch/arm64/kernel/cpufeature.c | 13 ++- arch/arm64/kernel/topology.c | 40 --------- arch/riscv/Kconfig | 2 +- arch/riscv/kernel/smpboot.c | 4 +- arch/x86/kernel/cpu/microcode/amd.c | 16 +++- drivers/acpi/acpi_extlog.c | 33 ++++--- drivers/acpi/video_detect.c | 64 ++++++++++++++ drivers/ata/ahci.h | 2 +- drivers/ata/ahci_imx.c | 2 +- drivers/base/arch_topology.c | 19 ++++ drivers/hid/hid-magicmouse.c | 2 +- drivers/hwmon/coretemp.c | 56 ++++++++---- drivers/iommu/intel-iommu.c | 5 ++ drivers/media/platform/qcom/venus/vdec.c | 2 + drivers/net/ethernet/hisilicon/hns/hnae.c | 4 +- drivers/net/hyperv/hyperv_net.h | 3 + drivers/net/hyperv/netvsc.c | 4 + drivers/net/hyperv/netvsc_drv.c | 20 +++++ drivers/net/phy/dp83867.c | 8 ++ drivers/net/usb/cdc_ether.c | 7 ++ drivers/net/usb/r8152.c | 1 + fs/btrfs/backref.c | 46 ++++++---- fs/ocfs2/namei.c | 23 +++-- fs/proc/task_mmu.c | 2 +- fs/xfs/libxfs/xfs_alloc.c | 2 +- fs/xfs/libxfs/xfs_attr_leaf.c | 6 +- fs/xfs/libxfs/xfs_bmap.c | 32 +------ fs/xfs/libxfs/xfs_bmap.h | 3 +- fs/xfs/libxfs/xfs_btree.c | 2 +- fs/xfs/libxfs/xfs_da_btree.c | 10 +-- fs/xfs/libxfs/xfs_dir2_block.c | 33 ++++++- fs/xfs/libxfs/xfs_dir2_data.c | 32 ++++++- fs/xfs/libxfs/xfs_dir2_leaf.c | 2 +- fs/xfs/libxfs/xfs_dir2_node.c | 8 +- fs/xfs/libxfs/xfs_dquot_buf.c | 8 +- fs/xfs/libxfs/xfs_format.h | 10 +-- fs/xfs/libxfs/xfs_trans_resv.c | 6 +- fs/xfs/xfs_attr_inactive.c | 6 +- fs/xfs/xfs_attr_list.c | 2 +- fs/xfs/xfs_bmap_util.c | 57 ++++++------ fs/xfs/xfs_buf.c | 22 +++++ fs/xfs/xfs_buf.h | 2 + fs/xfs/xfs_dquot.c | 26 +++--- fs/xfs/xfs_dquot.h | 98 +++++++++++---------- fs/xfs/xfs_dquot_item.c | 47 +++++++--- fs/xfs/xfs_dquot_item.h | 35 ++++---- fs/xfs/xfs_error.c | 7 +- fs/xfs/xfs_error.h | 2 +- fs/xfs/xfs_export.c | 14 +-- fs/xfs/xfs_file.c | 16 ++-- fs/xfs/xfs_inode.c | 23 ++++- fs/xfs/xfs_inode.h | 1 + fs/xfs/xfs_inode_item.c | 28 +++--- fs/xfs/xfs_log.c | 26 +++--- fs/xfs/xfs_log_cil.c | 39 ++++++-- fs/xfs/xfs_log_priv.h | 53 +++++++++-- fs/xfs/xfs_log_recover.c | 5 +- fs/xfs/xfs_mount.h | 5 ++ fs/xfs/xfs_qm.c | 64 ++++++++------ fs/xfs/xfs_qm_bhv.c | 6 +- fs/xfs/xfs_qm_syscalls.c | 142 +++++++++++++++--------------- fs/xfs/xfs_stats.c | 10 +-- fs/xfs/xfs_super.c | 28 ++++-- fs/xfs/xfs_trace.h | 1 + fs/xfs/xfs_trans_ail.c | 88 +++++++++++------- fs/xfs/xfs_trans_dquot.c | 54 ++++++------ fs/xfs/xfs_trans_priv.h | 6 +- net/atm/mpoa_proc.c | 3 +- net/sched/sch_cake.c | 4 + net/tipc/discover.c | 2 +- net/tipc/topsrv.c | 2 +- virt/kvm/arm/vgic/vgic-its.c | 5 +- 77 files changed, 973 insertions(+), 535 deletions(-)
From: Brian Foster bfoster@redhat.com
commit b73df17e4c5ba977205253fb7ef54267717a3cba upstream.
The insert range operation currently splits the extent at the target offset in a separate transaction and lock cycle from the one that shifts extents. In preparation for reworking insert range into an atomic operation, lift the code into the caller so it can be easily condensed to a single rolling transaction and lock cycle and eliminate the helper. No functional changes.
Signed-off-by: Brian Foster bfoster@redhat.com Reviewed-by: Allison Collins allison.henderson@oracle.com Reviewed-by: Christoph Hellwig hch@lst.de Reviewed-by: Darrick J. Wong darrick.wong@oracle.com Signed-off-by: Darrick J. Wong darrick.wong@oracle.com Acked-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Chandan Babu R chandan.babu@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/xfs/libxfs/xfs_bmap.c | 32 ++------------------------------ fs/xfs/libxfs/xfs_bmap.h | 3 ++- fs/xfs/xfs_bmap_util.c | 14 +++++++++++++- 3 files changed, 17 insertions(+), 32 deletions(-)
--- a/fs/xfs/libxfs/xfs_bmap.c +++ b/fs/xfs/libxfs/xfs_bmap.c @@ -5925,8 +5925,8 @@ del_cursor: * @split_fsb is a block where the extents is split. If split_fsb lies in a * hole or the first block of extents, just return 0. */ -STATIC int -xfs_bmap_split_extent_at( +int +xfs_bmap_split_extent( struct xfs_trans *tp, struct xfs_inode *ip, xfs_fileoff_t split_fsb) @@ -6037,34 +6037,6 @@ del_cursor: return error; }
-int -xfs_bmap_split_extent( - struct xfs_inode *ip, - xfs_fileoff_t split_fsb) -{ - struct xfs_mount *mp = ip->i_mount; - struct xfs_trans *tp; - int error; - - error = xfs_trans_alloc(mp, &M_RES(mp)->tr_write, - XFS_DIOSTRAT_SPACE_RES(mp, 0), 0, 0, &tp); - if (error) - return error; - - xfs_ilock(ip, XFS_ILOCK_EXCL); - xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL); - - error = xfs_bmap_split_extent_at(tp, ip, split_fsb); - if (error) - goto out; - - return xfs_trans_commit(tp); - -out: - xfs_trans_cancel(tp); - return error; -} - /* Deferred mapping is only for real extents in the data fork. */ static bool xfs_bmap_is_update_needed( --- a/fs/xfs/libxfs/xfs_bmap.h +++ b/fs/xfs/libxfs/xfs_bmap.h @@ -222,7 +222,8 @@ int xfs_bmap_can_insert_extents(struct x int xfs_bmap_insert_extents(struct xfs_trans *tp, struct xfs_inode *ip, xfs_fileoff_t *next_fsb, xfs_fileoff_t offset_shift_fsb, bool *done, xfs_fileoff_t stop_fsb); -int xfs_bmap_split_extent(struct xfs_inode *ip, xfs_fileoff_t split_offset); +int xfs_bmap_split_extent(struct xfs_trans *tp, struct xfs_inode *ip, + xfs_fileoff_t split_offset); int xfs_bmapi_reserve_delalloc(struct xfs_inode *ip, int whichfork, xfs_fileoff_t off, xfs_filblks_t len, xfs_filblks_t prealloc, struct xfs_bmbt_irec *got, struct xfs_iext_cursor *cur, --- a/fs/xfs/xfs_bmap_util.c +++ b/fs/xfs/xfs_bmap_util.c @@ -1326,7 +1326,19 @@ xfs_insert_file_space( * is not the starting block of extent, we need to split the extent at * stop_fsb. */ - error = xfs_bmap_split_extent(ip, stop_fsb); + error = xfs_trans_alloc(mp, &M_RES(mp)->tr_write, + XFS_DIOSTRAT_SPACE_RES(mp, 0), 0, 0, &tp); + if (error) + return error; + + xfs_ilock(ip, XFS_ILOCK_EXCL); + xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL); + + error = xfs_bmap_split_extent(tp, ip, stop_fsb); + if (error) + goto out_trans_cancel; + + error = xfs_trans_commit(tp); if (error) return error;
From: Brian Foster bfoster@redhat.com
commit dd87f87d87fa4359a54e7b44549742f579e3e805 upstream.
The insert range operation uses a unique transaction and ilock cycle for the extent split and each extent shift iteration of the overall operation. While this works, it is risks racing with other operations in subtle ways such as COW writeback modifying an extent tree in the middle of a shift operation.
To avoid this problem, make insert range atomic with respect to ilock. Hold the ilock across the entire operation, replace the individual transactions with a single rolling transaction sequence and relog the inode to keep it moving in the log. This guarantees that nothing else can change the extent mapping of an inode while an insert range operation is in progress.
Signed-off-by: Brian Foster bfoster@redhat.com Reviewed-by: Allison Collins allison.henderson@oracle.com Reviewed-by: Christoph Hellwig hch@lst.de Reviewed-by: Darrick J. Wong darrick.wong@oracle.com Signed-off-by: Darrick J. Wong darrick.wong@oracle.com Acked-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Chandan Babu R chandan.babu@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/xfs/xfs_bmap_util.c | 32 +++++++++++++------------------- 1 file changed, 13 insertions(+), 19 deletions(-)
--- a/fs/xfs/xfs_bmap_util.c +++ b/fs/xfs/xfs_bmap_util.c @@ -1321,47 +1321,41 @@ xfs_insert_file_space( if (error) return error;
- /* - * The extent shifting code works on extent granularity. So, if stop_fsb - * is not the starting block of extent, we need to split the extent at - * stop_fsb. - */ error = xfs_trans_alloc(mp, &M_RES(mp)->tr_write, XFS_DIOSTRAT_SPACE_RES(mp, 0), 0, 0, &tp); if (error) return error;
xfs_ilock(ip, XFS_ILOCK_EXCL); - xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL); + xfs_trans_ijoin(tp, ip, 0);
+ /* + * The extent shifting code works on extent granularity. So, if stop_fsb + * is not the starting block of extent, we need to split the extent at + * stop_fsb. + */ error = xfs_bmap_split_extent(tp, ip, stop_fsb); if (error) goto out_trans_cancel;
- error = xfs_trans_commit(tp); - if (error) - return error; - - while (!error && !done) { - error = xfs_trans_alloc(mp, &M_RES(mp)->tr_write, 0, 0, 0, - &tp); + do { + error = xfs_trans_roll_inode(&tp, ip); if (error) - break; + goto out_trans_cancel;
- xfs_ilock(ip, XFS_ILOCK_EXCL); - xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL); error = xfs_bmap_insert_extents(tp, ip, &next_fsb, shift_fsb, &done, stop_fsb); if (error) goto out_trans_cancel; + } while (!done);
- error = xfs_trans_commit(tp); - } - + error = xfs_trans_commit(tp); + xfs_iunlock(ip, XFS_ILOCK_EXCL); return error;
out_trans_cancel: xfs_trans_cancel(tp); + xfs_iunlock(ip, XFS_ILOCK_EXCL); return error; }
From: Brian Foster bfoster@redhat.com
commit 211683b21de959a647de74faedfdd8a5d189327e upstream.
The collapse range operation uses a unique transaction and ilock cycle for the hole punch and each extent shift iteration of the overall operation. While the hole punch is safe as a separate operation due to the iolock, cycling the ilock after each extent shift is risky w.r.t. concurrent operations, similar to insert range.
To avoid this problem, make collapse range atomic with respect to ilock. Hold the ilock across the entire operation, replace the individual transactions with a single rolling transaction sequence and finish dfops on each iteration to perform pending frees and roll the transaction. Remove the unnecessary quota reservation as collapse range can only ever merge extents (and thus remove extent records and potentially free bmap blocks). The dfops call automatically relogs the inode to keep it moving in the log. This guarantees that nothing else can change the extent mapping of an inode while a collapse range operation is in progress.
Signed-off-by: Brian Foster bfoster@redhat.com Reviewed-by: Christoph Hellwig hch@lst.de Reviewed-by: Darrick J. Wong darrick.wong@oracle.com Signed-off-by: Darrick J. Wong darrick.wong@oracle.com Acked-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Chandan Babu R chandan.babu@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/xfs/xfs_bmap_util.c | 29 +++++++++++++++-------------- 1 file changed, 15 insertions(+), 14 deletions(-)
--- a/fs/xfs/xfs_bmap_util.c +++ b/fs/xfs/xfs_bmap_util.c @@ -1237,7 +1237,6 @@ xfs_collapse_file_space( int error; xfs_fileoff_t next_fsb = XFS_B_TO_FSB(mp, offset + len); xfs_fileoff_t shift_fsb = XFS_B_TO_FSB(mp, len); - uint resblks = XFS_DIOSTRAT_SPACE_RES(mp, 0); bool done = false;
ASSERT(xfs_isilocked(ip, XFS_IOLOCK_EXCL)); @@ -1253,32 +1252,34 @@ xfs_collapse_file_space( if (error) return error;
- while (!error && !done) { - error = xfs_trans_alloc(mp, &M_RES(mp)->tr_write, resblks, 0, 0, - &tp); - if (error) - break; + error = xfs_trans_alloc(mp, &M_RES(mp)->tr_write, 0, 0, 0, &tp); + if (error) + return error;
- xfs_ilock(ip, XFS_ILOCK_EXCL); - error = xfs_trans_reserve_quota(tp, mp, ip->i_udquot, - ip->i_gdquot, ip->i_pdquot, resblks, 0, - XFS_QMOPT_RES_REGBLKS); - if (error) - goto out_trans_cancel; - xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL); + xfs_ilock(ip, XFS_ILOCK_EXCL); + xfs_trans_ijoin(tp, ip, 0);
+ while (!done) { error = xfs_bmap_collapse_extents(tp, ip, &next_fsb, shift_fsb, &done); if (error) goto out_trans_cancel; + if (done) + break;
- error = xfs_trans_commit(tp); + /* finish any deferred frees and roll the transaction */ + error = xfs_defer_finish(&tp); + if (error) + goto out_trans_cancel; }
+ error = xfs_trans_commit(tp); + xfs_iunlock(ip, XFS_ILOCK_EXCL); return error;
out_trans_cancel: xfs_trans_cancel(tp); + xfs_iunlock(ip, XFS_ILOCK_EXCL); return error; }
From: "Darrick J. Wong" darrick.wong@oracle.com
commit 8d57c21600a514d7a9237327c2496ae159bab5bb upstream.
Add a helper function to get rid of buffers that we have decided are corrupt after the verifiers have run. This function is intended to handle metadata checks that can't happen in the verifiers, such as inter-block relationship checking. Note that we now mark the buffer stale so that it will not end up on any LRU and will be purged on release.
Signed-off-by: Darrick J. Wong darrick.wong@oracle.com Reviewed-by: Dave Chinner dchinner@redhat.com Acked-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Chandan Babu R chandan.babu@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/xfs/libxfs/xfs_alloc.c | 2 +- fs/xfs/libxfs/xfs_attr_leaf.c | 6 +++--- fs/xfs/libxfs/xfs_btree.c | 2 +- fs/xfs/libxfs/xfs_da_btree.c | 10 +++++----- fs/xfs/libxfs/xfs_dir2_leaf.c | 2 +- fs/xfs/libxfs/xfs_dir2_node.c | 6 +++--- fs/xfs/xfs_attr_inactive.c | 6 +++--- fs/xfs/xfs_attr_list.c | 2 +- fs/xfs/xfs_buf.c | 22 ++++++++++++++++++++++ fs/xfs/xfs_buf.h | 2 ++ fs/xfs/xfs_error.c | 2 ++ fs/xfs/xfs_inode.c | 4 ++-- 12 files changed, 46 insertions(+), 20 deletions(-)
--- a/fs/xfs/libxfs/xfs_alloc.c +++ b/fs/xfs/libxfs/xfs_alloc.c @@ -685,7 +685,7 @@ xfs_alloc_update_counters( xfs_trans_agblocks_delta(tp, len); if (unlikely(be32_to_cpu(agf->agf_freeblks) > be32_to_cpu(agf->agf_length))) { - xfs_buf_corruption_error(agbp); + xfs_buf_mark_corrupt(agbp); return -EFSCORRUPTED; }
--- a/fs/xfs/libxfs/xfs_attr_leaf.c +++ b/fs/xfs/libxfs/xfs_attr_leaf.c @@ -2288,7 +2288,7 @@ xfs_attr3_leaf_lookup_int( xfs_attr3_leaf_hdr_from_disk(args->geo, &ichdr, leaf); entries = xfs_attr3_leaf_entryp(leaf); if (ichdr.count >= args->geo->blksize / 8) { - xfs_buf_corruption_error(bp); + xfs_buf_mark_corrupt(bp); return -EFSCORRUPTED; }
@@ -2307,11 +2307,11 @@ xfs_attr3_leaf_lookup_int( break; } if (!(probe >= 0 && (!ichdr.count || probe < ichdr.count))) { - xfs_buf_corruption_error(bp); + xfs_buf_mark_corrupt(bp); return -EFSCORRUPTED; } if (!(span <= 4 || be32_to_cpu(entry->hashval) == hashval)) { - xfs_buf_corruption_error(bp); + xfs_buf_mark_corrupt(bp); return -EFSCORRUPTED; }
--- a/fs/xfs/libxfs/xfs_btree.c +++ b/fs/xfs/libxfs/xfs_btree.c @@ -1820,7 +1820,7 @@ xfs_btree_lookup_get_block(
out_bad: *blkp = NULL; - xfs_buf_corruption_error(bp); + xfs_buf_mark_corrupt(bp); xfs_trans_brelse(cur->bc_tp, bp); return -EFSCORRUPTED; } --- a/fs/xfs/libxfs/xfs_da_btree.c +++ b/fs/xfs/libxfs/xfs_da_btree.c @@ -504,7 +504,7 @@ xfs_da3_split( node = oldblk->bp->b_addr; if (node->hdr.info.forw) { if (be32_to_cpu(node->hdr.info.forw) != addblk->blkno) { - xfs_buf_corruption_error(oldblk->bp); + xfs_buf_mark_corrupt(oldblk->bp); error = -EFSCORRUPTED; goto out; } @@ -517,7 +517,7 @@ xfs_da3_split( node = oldblk->bp->b_addr; if (node->hdr.info.back) { if (be32_to_cpu(node->hdr.info.back) != addblk->blkno) { - xfs_buf_corruption_error(oldblk->bp); + xfs_buf_mark_corrupt(oldblk->bp); error = -EFSCORRUPTED; goto out; } @@ -1544,7 +1544,7 @@ xfs_da3_node_lookup_int( }
if (magic != XFS_DA_NODE_MAGIC && magic != XFS_DA3_NODE_MAGIC) { - xfs_buf_corruption_error(blk->bp); + xfs_buf_mark_corrupt(blk->bp); return -EFSCORRUPTED; }
@@ -1559,7 +1559,7 @@ xfs_da3_node_lookup_int(
/* Tree taller than we can handle; bail out! */ if (nodehdr.level >= XFS_DA_NODE_MAXDEPTH) { - xfs_buf_corruption_error(blk->bp); + xfs_buf_mark_corrupt(blk->bp); return -EFSCORRUPTED; }
@@ -1567,7 +1567,7 @@ xfs_da3_node_lookup_int( if (blkno == args->geo->leafblk) expected_level = nodehdr.level - 1; else if (expected_level != nodehdr.level) { - xfs_buf_corruption_error(blk->bp); + xfs_buf_mark_corrupt(blk->bp); return -EFSCORRUPTED; } else expected_level--; --- a/fs/xfs/libxfs/xfs_dir2_leaf.c +++ b/fs/xfs/libxfs/xfs_dir2_leaf.c @@ -1344,7 +1344,7 @@ xfs_dir2_leaf_removename( ltp = xfs_dir2_leaf_tail_p(args->geo, leaf); bestsp = xfs_dir2_leaf_bests_p(ltp); if (be16_to_cpu(bestsp[db]) != oldbest) { - xfs_buf_corruption_error(lbp); + xfs_buf_mark_corrupt(lbp); return -EFSCORRUPTED; } /* --- a/fs/xfs/libxfs/xfs_dir2_node.c +++ b/fs/xfs/libxfs/xfs_dir2_node.c @@ -375,7 +375,7 @@ xfs_dir2_leaf_to_node( ltp = xfs_dir2_leaf_tail_p(args->geo, leaf); if (be32_to_cpu(ltp->bestcount) > (uint)dp->i_d.di_size / args->geo->blksize) { - xfs_buf_corruption_error(lbp); + xfs_buf_mark_corrupt(lbp); return -EFSCORRUPTED; }
@@ -449,7 +449,7 @@ xfs_dir2_leafn_add( * into other peoples memory */ if (index < 0) { - xfs_buf_corruption_error(bp); + xfs_buf_mark_corrupt(bp); return -EFSCORRUPTED; }
@@ -745,7 +745,7 @@ xfs_dir2_leafn_lookup_for_entry(
xfs_dir3_leaf_check(dp, bp); if (leafhdr.count <= 0) { - xfs_buf_corruption_error(bp); + xfs_buf_mark_corrupt(bp); return -EFSCORRUPTED; }
--- a/fs/xfs/xfs_attr_inactive.c +++ b/fs/xfs/xfs_attr_inactive.c @@ -145,7 +145,7 @@ xfs_attr3_node_inactive( * Since this code is recursive (gasp!) we must protect ourselves. */ if (level > XFS_DA_NODE_MAXDEPTH) { - xfs_buf_corruption_error(bp); + xfs_buf_mark_corrupt(bp); xfs_trans_brelse(*trans, bp); /* no locks for later trans */ return -EFSCORRUPTED; } @@ -196,7 +196,7 @@ xfs_attr3_node_inactive( error = xfs_attr3_leaf_inactive(trans, dp, child_bp); break; default: - xfs_buf_corruption_error(child_bp); + xfs_buf_mark_corrupt(child_bp); xfs_trans_brelse(*trans, child_bp); error = -EFSCORRUPTED; break; @@ -281,7 +281,7 @@ xfs_attr3_root_inactive( break; default: error = -EFSCORRUPTED; - xfs_buf_corruption_error(bp); + xfs_buf_mark_corrupt(bp); xfs_trans_brelse(*trans, bp); break; } --- a/fs/xfs/xfs_attr_list.c +++ b/fs/xfs/xfs_attr_list.c @@ -271,7 +271,7 @@ xfs_attr_node_list_lookup( return 0;
out_corruptbuf: - xfs_buf_corruption_error(bp); + xfs_buf_mark_corrupt(bp); xfs_trans_brelse(tp, bp); return -EFSCORRUPTED; } --- a/fs/xfs/xfs_buf.c +++ b/fs/xfs/xfs_buf.c @@ -1547,6 +1547,28 @@ xfs_buf_zero( }
/* + * Log a message about and stale a buffer that a caller has decided is corrupt. + * + * This function should be called for the kinds of metadata corruption that + * cannot be detect from a verifier, such as incorrect inter-block relationship + * data. Do /not/ call this function from a verifier function. + * + * The buffer must be XBF_DONE prior to the call. Afterwards, the buffer will + * be marked stale, but b_error will not be set. The caller is responsible for + * releasing the buffer or fixing it. + */ +void +__xfs_buf_mark_corrupt( + struct xfs_buf *bp, + xfs_failaddr_t fa) +{ + ASSERT(bp->b_flags & XBF_DONE); + + xfs_buf_corruption_error(bp); + xfs_buf_stale(bp); +} + +/* * Handling of buffer targets (buftargs). */
--- a/fs/xfs/xfs_buf.h +++ b/fs/xfs/xfs_buf.h @@ -270,6 +270,8 @@ static inline int xfs_buf_submit(struct }
void xfs_buf_zero(struct xfs_buf *bp, size_t boff, size_t bsize); +void __xfs_buf_mark_corrupt(struct xfs_buf *bp, xfs_failaddr_t fa); +#define xfs_buf_mark_corrupt(bp) __xfs_buf_mark_corrupt((bp), __this_address)
/* Buffer Utility Routines */ extern void *xfs_buf_offset(struct xfs_buf *, size_t); --- a/fs/xfs/xfs_error.c +++ b/fs/xfs/xfs_error.c @@ -345,6 +345,8 @@ xfs_corruption_error( * Complain about the kinds of metadata corruption that we can't detect from a * verifier, such as incorrect inter-block relationship data. Does not set * bp->b_error. + * + * Call xfs_buf_mark_corrupt, not this function. */ void xfs_buf_corruption_error( --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -2149,7 +2149,7 @@ xfs_iunlink_update_bucket( * head of the list. */ if (old_value == new_agino) { - xfs_buf_corruption_error(agibp); + xfs_buf_mark_corrupt(agibp); return -EFSCORRUPTED; }
@@ -2283,7 +2283,7 @@ xfs_iunlink( next_agino = be32_to_cpu(agi->agi_unlinked[bucket_index]); if (next_agino == agino || !xfs_verify_agino_or_null(mp, agno, next_agino)) { - xfs_buf_corruption_error(agibp); + xfs_buf_mark_corrupt(agibp); return -EFSCORRUPTED; }
From: "Darrick J. Wong" darrick.wong@oracle.com
commit e83cf875d67a6cb9ddfaa8b45d2fa93d12b5c66f upstream.
Add a xfs_failaddr_t parameter to this function so that callers can potentially pass in (and therefore report) the exact point in the code where we decided that a metadata buffer was corrupt. This enables us to wire it up to checking functions that have to run outside of verifiers.
Signed-off-by: Darrick J. Wong darrick.wong@oracle.com Reviewed-by: Dave Chinner dchinner@redhat.com Acked-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Chandan Babu R chandan.babu@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/xfs/xfs_buf.c | 2 +- fs/xfs/xfs_error.c | 5 +++-- fs/xfs/xfs_error.h | 2 +- 3 files changed, 5 insertions(+), 4 deletions(-)
--- a/fs/xfs/xfs_buf.c +++ b/fs/xfs/xfs_buf.c @@ -1564,7 +1564,7 @@ __xfs_buf_mark_corrupt( { ASSERT(bp->b_flags & XBF_DONE);
- xfs_buf_corruption_error(bp); + xfs_buf_corruption_error(bp, fa); xfs_buf_stale(bp); }
--- a/fs/xfs/xfs_error.c +++ b/fs/xfs/xfs_error.c @@ -350,13 +350,14 @@ xfs_corruption_error( */ void xfs_buf_corruption_error( - struct xfs_buf *bp) + struct xfs_buf *bp, + xfs_failaddr_t fa) { struct xfs_mount *mp = bp->b_mount;
xfs_alert_tag(mp, XFS_PTAG_VERIFIER_ERROR, "Metadata corruption detected at %pS, %s block 0x%llx", - __return_address, bp->b_ops->name, bp->b_bn); + fa, bp->b_ops->name, bp->b_bn);
xfs_alert(mp, "Unmount and run xfs_repair");
--- a/fs/xfs/xfs_error.h +++ b/fs/xfs/xfs_error.h @@ -15,7 +15,7 @@ extern void xfs_corruption_error(const c struct xfs_mount *mp, const void *buf, size_t bufsize, const char *filename, int linenum, xfs_failaddr_t failaddr); -void xfs_buf_corruption_error(struct xfs_buf *bp); +void xfs_buf_corruption_error(struct xfs_buf *bp, xfs_failaddr_t fa); extern void xfs_buf_verifier_error(struct xfs_buf *bp, int error, const char *name, const void *buf, size_t bufsz, xfs_failaddr_t failaddr);
From: "Darrick J. Wong" darrick.wong@oracle.com
commit ce99494c9699df58b31d0a839e957f86cd58c755 upstream.
xfs_verifier_error is supposed to be called on a corrupt metadata buffer from within a buffer verifier function, whereas xfs_buf_mark_corrupt is the function to be called when a piece of code has read a buffer and catches something that a read verifier cannot. The first function sets b_error anticipating that the low level buffer handling code will see the nonzero b_error and clear XBF_DONE on the buffer, whereas the second function does not.
Since xfs_dir3_free_header_check examines fields in the dir free block header that require more context than can be provided to read verifiers, we must call xfs_buf_mark_corrupt when it finds a problem.
Switching the calls has a secondary effect that we no longer corrupt the buffer state by setting b_error and leaving XBF_DONE set. When /that/ happens, we'll trip over various state assertions (most commonly the b_error check in xfs_buf_reverify) on a subsequent attempt to read the buffer.
Fixes: bc1a09b8e334bf5f ("xfs: refactor verifier callers to print address of failing check") Signed-off-by: Darrick J. Wong darrick.wong@oracle.com Reviewed-by: Dave Chinner dchinner@redhat.com Acked-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Chandan Babu R chandan.babu@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/xfs/libxfs/xfs_dir2_node.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/fs/xfs/libxfs/xfs_dir2_node.c +++ b/fs/xfs/libxfs/xfs_dir2_node.c @@ -208,7 +208,7 @@ __xfs_dir3_free_read( /* Check things that we can't do in the verifier. */ fa = xfs_dir3_free_header_check(dp, fbno, *bpp); if (fa) { - xfs_verifier_error(*bpp, -EFSCORRUPTED, fa); + __xfs_buf_mark_corrupt(*bpp, fa); xfs_trans_brelse(tp, *bpp); *bpp = NULL; return -EFSCORRUPTED;
From: "Darrick J. Wong" darrick.wong@oracle.com
commit a10c21ed5d5241d11cf1d5a4556730840572900b upstream.
[Slightly edit xfs_dir3_data_read() to work with existing mapped_bno argument instead of flag values introduced in later kernels]
Check the owner field of dir3 data block headers. If it's corrupt, release the buffer and return EFSCORRUPTED. All callers handle this properly.
Signed-off-by: Darrick J. Wong darrick.wong@oracle.com Reviewed-by: Dave Chinner dchinner@redhat.com Acked-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Chandan Babu R chandan.babu@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/xfs/libxfs/xfs_dir2_data.c | 32 ++++++++++++++++++++++++++++++-- 1 file changed, 30 insertions(+), 2 deletions(-)
--- a/fs/xfs/libxfs/xfs_dir2_data.c +++ b/fs/xfs/libxfs/xfs_dir2_data.c @@ -348,6 +348,22 @@ static const struct xfs_buf_ops xfs_dir3 .verify_write = xfs_dir3_data_write_verify, };
+static xfs_failaddr_t +xfs_dir3_data_header_check( + struct xfs_inode *dp, + struct xfs_buf *bp) +{ + struct xfs_mount *mp = dp->i_mount; + + if (xfs_sb_version_hascrc(&mp->m_sb)) { + struct xfs_dir3_data_hdr *hdr3 = bp->b_addr; + + if (be64_to_cpu(hdr3->hdr.owner) != dp->i_ino) + return __this_address; + } + + return NULL; +}
int xfs_dir3_data_read( @@ -357,12 +373,24 @@ xfs_dir3_data_read( xfs_daddr_t mapped_bno, struct xfs_buf **bpp) { + xfs_failaddr_t fa; int err;
err = xfs_da_read_buf(tp, dp, bno, mapped_bno, bpp, XFS_DATA_FORK, &xfs_dir3_data_buf_ops); - if (!err && tp && *bpp) - xfs_trans_buf_set_type(tp, *bpp, XFS_BLFT_DIR_DATA_BUF); + if (err || !*bpp) + return err; + + /* Check things that we can't do in the verifier. */ + fa = xfs_dir3_data_header_check(dp, *bpp); + if (fa) { + __xfs_buf_mark_corrupt(*bpp, fa); + xfs_trans_brelse(tp, *bpp); + *bpp = NULL; + return -EFSCORRUPTED; + } + + xfs_trans_buf_set_type(tp, *bpp, XFS_BLFT_DIR_DATA_BUF); return err; }
From: "Darrick J. Wong" darrick.wong@oracle.com
commit 1b2c1a63b678d63e9c98314d44413f5af79c9c80 upstream.
Check the owner field of dir3 block headers. If it's corrupt, release the buffer and return EFSCORRUPTED. All callers handle this properly.
Signed-off-by: Darrick J. Wong darrick.wong@oracle.com Reviewed-by: Dave Chinner dchinner@redhat.com Acked-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Chandan Babu R chandan.babu@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/xfs/libxfs/xfs_dir2_block.c | 33 +++++++++++++++++++++++++++++++-- 1 file changed, 31 insertions(+), 2 deletions(-)
--- a/fs/xfs/libxfs/xfs_dir2_block.c +++ b/fs/xfs/libxfs/xfs_dir2_block.c @@ -114,6 +114,23 @@ const struct xfs_buf_ops xfs_dir3_block_ .verify_struct = xfs_dir3_block_verify, };
+static xfs_failaddr_t +xfs_dir3_block_header_check( + struct xfs_inode *dp, + struct xfs_buf *bp) +{ + struct xfs_mount *mp = dp->i_mount; + + if (xfs_sb_version_hascrc(&mp->m_sb)) { + struct xfs_dir3_blk_hdr *hdr3 = bp->b_addr; + + if (be64_to_cpu(hdr3->owner) != dp->i_ino) + return __this_address; + } + + return NULL; +} + int xfs_dir3_block_read( struct xfs_trans *tp, @@ -121,12 +138,24 @@ xfs_dir3_block_read( struct xfs_buf **bpp) { struct xfs_mount *mp = dp->i_mount; + xfs_failaddr_t fa; int err;
err = xfs_da_read_buf(tp, dp, mp->m_dir_geo->datablk, -1, bpp, XFS_DATA_FORK, &xfs_dir3_block_buf_ops); - if (!err && tp && *bpp) - xfs_trans_buf_set_type(tp, *bpp, XFS_BLFT_DIR_BLOCK_BUF); + if (err || !*bpp) + return err; + + /* Check things that we can't do in the verifier. */ + fa = xfs_dir3_block_header_check(dp, *bpp); + if (fa) { + __xfs_buf_mark_corrupt(*bpp, fa); + xfs_trans_brelse(tp, *bpp); + *bpp = NULL; + return -EFSCORRUPTED; + } + + xfs_trans_buf_set_type(tp, *bpp, XFS_BLFT_DIR_BLOCK_BUF); return err; }
From: Takashi Iwai tiwai@suse.de
commit 17bb60b74124e9491d593e2601e3afe14daa2f57 upstream.
Since snprintf() returns the would-be-output size instead of the actual output size, the succeeding calls may go beyond the given buffer limit. Fix it by replacing with scnprintf().
Signed-off-by: Takashi Iwai tiwai@suse.de Reviewed-by: Darrick J. Wong darrick.wong@oracle.com Signed-off-by: Darrick J. Wong darrick.wong@oracle.com Acked-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Chandan Babu R chandan.babu@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/xfs/xfs_stats.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)
--- a/fs/xfs/xfs_stats.c +++ b/fs/xfs/xfs_stats.c @@ -57,13 +57,13 @@ int xfs_stats_format(struct xfsstats __p /* Loop over all stats groups */
for (i = j = 0; i < ARRAY_SIZE(xstats); i++) { - len += snprintf(buf + len, PATH_MAX - len, "%s", + len += scnprintf(buf + len, PATH_MAX - len, "%s", xstats[i].desc); /* inner loop does each group */ for (; j < xstats[i].endpoint; j++) - len += snprintf(buf + len, PATH_MAX - len, " %u", + len += scnprintf(buf + len, PATH_MAX - len, " %u", counter_val(stats, j)); - len += snprintf(buf + len, PATH_MAX - len, "\n"); + len += scnprintf(buf + len, PATH_MAX - len, "\n"); } /* extra precision counters */ for_each_possible_cpu(i) { @@ -72,9 +72,9 @@ int xfs_stats_format(struct xfsstats __p xs_read_bytes += per_cpu_ptr(stats, i)->s.xs_read_bytes; }
- len += snprintf(buf + len, PATH_MAX-len, "xpc %Lu %Lu %Lu\n", + len += scnprintf(buf + len, PATH_MAX-len, "xpc %Lu %Lu %Lu\n", xs_xstrat_bytes, xs_write_bytes, xs_read_bytes); - len += snprintf(buf + len, PATH_MAX-len, "debug %u\n", + len += scnprintf(buf + len, PATH_MAX-len, "debug %u\n", #if defined(DEBUG) 1); #else
From: Pavel Reichl preichl@redhat.com
commit aefe69a45d84901c702f87672ec1e93de1d03f73 upstream.
Signed-off-by: Pavel Reichl preichl@redhat.com Reviewed-by: Darrick J. Wong darrick.wong@oracle.com [darrick: fix some of the comments] Signed-off-by: Darrick J. Wong darrick.wong@oracle.com Acked-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Chandan Babu R chandan.babu@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/xfs/libxfs/xfs_dquot_buf.c | 8 +-- fs/xfs/libxfs/xfs_format.h | 10 ++-- fs/xfs/libxfs/xfs_trans_resv.c | 2 fs/xfs/xfs_dquot.c | 18 +++---- fs/xfs/xfs_dquot.h | 98 ++++++++++++++++++++--------------------- fs/xfs/xfs_log_recover.c | 5 +- fs/xfs/xfs_qm.c | 30 ++++++------ fs/xfs/xfs_qm_bhv.c | 6 +- fs/xfs/xfs_trans_dquot.c | 44 +++++++++--------- 9 files changed, 112 insertions(+), 109 deletions(-)
--- a/fs/xfs/libxfs/xfs_dquot_buf.c +++ b/fs/xfs/libxfs/xfs_dquot_buf.c @@ -35,10 +35,10 @@ xfs_calc_dquots_per_chunk(
xfs_failaddr_t xfs_dquot_verify( - struct xfs_mount *mp, - xfs_disk_dquot_t *ddq, - xfs_dqid_t id, - uint type) /* used only during quotacheck */ + struct xfs_mount *mp, + struct xfs_disk_dquot *ddq, + xfs_dqid_t id, + uint type) /* used only during quotacheck */ { /* * We can encounter an uninitialized dquot buffer for 2 reasons: --- a/fs/xfs/libxfs/xfs_format.h +++ b/fs/xfs/libxfs/xfs_format.h @@ -1144,11 +1144,11 @@ static inline void xfs_dinode_put_rdev(s
/* * This is the main portion of the on-disk representation of quota - * information for a user. This is the q_core of the xfs_dquot_t that + * information for a user. This is the q_core of the struct xfs_dquot that * is kept in kernel memory. We pad this with some more expansion room * to construct the on disk structure. */ -typedef struct xfs_disk_dquot { +struct xfs_disk_dquot { __be16 d_magic; /* dquot magic = XFS_DQUOT_MAGIC */ __u8 d_version; /* dquot version */ __u8 d_flags; /* XFS_DQ_USER/PROJ/GROUP */ @@ -1171,15 +1171,15 @@ typedef struct xfs_disk_dquot { __be32 d_rtbtimer; /* similar to above; for RT disk blocks */ __be16 d_rtbwarns; /* warnings issued wrt RT disk blocks */ __be16 d_pad; -} xfs_disk_dquot_t; +};
/* * This is what goes on disk. This is separated from the xfs_disk_dquot because * carrying the unnecessary padding would be a waste of memory. */ typedef struct xfs_dqblk { - xfs_disk_dquot_t dd_diskdq; /* portion that lives incore as well */ - char dd_fill[4]; /* filling for posterity */ + struct xfs_disk_dquot dd_diskdq; /* portion living incore as well */ + char dd_fill[4];/* filling for posterity */
/* * These two are only present on filesystems with the CRC bits set. --- a/fs/xfs/libxfs/xfs_trans_resv.c +++ b/fs/xfs/libxfs/xfs_trans_resv.c @@ -776,7 +776,7 @@ xfs_calc_clear_agi_bucket_reservation(
/* * Adjusting quota limits. - * the xfs_disk_dquot_t: sizeof(struct xfs_disk_dquot) + * the disk quota buffer: sizeof(struct xfs_disk_dquot) */ STATIC uint xfs_calc_qm_setqlim_reservation(void) --- a/fs/xfs/xfs_dquot.c +++ b/fs/xfs/xfs_dquot.c @@ -48,7 +48,7 @@ static struct lock_class_key xfs_dquot_p */ void xfs_qm_dqdestroy( - xfs_dquot_t *dqp) + struct xfs_dquot *dqp) { ASSERT(list_empty(&dqp->q_lru));
@@ -113,8 +113,8 @@ xfs_qm_adjust_dqlimits( */ void xfs_qm_adjust_dqtimers( - xfs_mount_t *mp, - xfs_disk_dquot_t *d) + struct xfs_mount *mp, + struct xfs_disk_dquot *d) { ASSERT(d->d_id);
@@ -497,7 +497,7 @@ xfs_dquot_from_disk( struct xfs_disk_dquot *ddqp = bp->b_addr + dqp->q_bufoffset;
/* copy everything from disk dquot to the incore dquot */ - memcpy(&dqp->q_core, ddqp, sizeof(xfs_disk_dquot_t)); + memcpy(&dqp->q_core, ddqp, sizeof(struct xfs_disk_dquot));
/* * Reservation counters are defined as reservation plus current usage @@ -989,7 +989,7 @@ xfs_qm_dqput( */ void xfs_qm_dqrele( - xfs_dquot_t *dqp) + struct xfs_dquot *dqp) { if (!dqp) return; @@ -1019,7 +1019,7 @@ xfs_qm_dqflush_done( struct xfs_log_item *lip) { xfs_dq_logitem_t *qip = (struct xfs_dq_logitem *)lip; - xfs_dquot_t *dqp = qip->qli_dquot; + struct xfs_dquot *dqp = qip->qli_dquot; struct xfs_ail *ailp = lip->li_ailp;
/* @@ -1129,7 +1129,7 @@ xfs_qm_dqflush( }
/* This is the only portion of data that needs to persist */ - memcpy(ddqp, &dqp->q_core, sizeof(xfs_disk_dquot_t)); + memcpy(ddqp, &dqp->q_core, sizeof(struct xfs_disk_dquot));
/* * Clear the dirty field and remember the flush lsn for later use. @@ -1187,8 +1187,8 @@ out_unlock: */ void xfs_dqlock2( - xfs_dquot_t *d1, - xfs_dquot_t *d2) + struct xfs_dquot *d1, + struct xfs_dquot *d2) { if (d1 && d2) { ASSERT(d1 != d2); --- a/fs/xfs/xfs_dquot.h +++ b/fs/xfs/xfs_dquot.h @@ -30,33 +30,36 @@ enum { /* * The incore dquot structure */ -typedef struct xfs_dquot { - uint dq_flags; /* various flags (XFS_DQ_*) */ - struct list_head q_lru; /* global free list of dquots */ - struct xfs_mount*q_mount; /* filesystem this relates to */ - uint q_nrefs; /* # active refs from inodes */ - xfs_daddr_t q_blkno; /* blkno of dquot buffer */ - int q_bufoffset; /* off of dq in buffer (# dquots) */ - xfs_fileoff_t q_fileoffset; /* offset in quotas file */ - - xfs_disk_dquot_t q_core; /* actual usage & quotas */ - xfs_dq_logitem_t q_logitem; /* dquot log item */ - xfs_qcnt_t q_res_bcount; /* total regular nblks used+reserved */ - xfs_qcnt_t q_res_icount; /* total inos allocd+reserved */ - xfs_qcnt_t q_res_rtbcount;/* total realtime blks used+reserved */ - xfs_qcnt_t q_prealloc_lo_wmark;/* prealloc throttle wmark */ - xfs_qcnt_t q_prealloc_hi_wmark;/* prealloc disabled wmark */ - int64_t q_low_space[XFS_QLOWSP_MAX]; - struct mutex q_qlock; /* quota lock */ - struct completion q_flush; /* flush completion queue */ - atomic_t q_pincount; /* dquot pin count */ - wait_queue_head_t q_pinwait; /* dquot pinning wait queue */ -} xfs_dquot_t; +struct xfs_dquot { + uint dq_flags; + struct list_head q_lru; + struct xfs_mount *q_mount; + uint q_nrefs; + xfs_daddr_t q_blkno; + int q_bufoffset; + xfs_fileoff_t q_fileoffset; + + struct xfs_disk_dquot q_core; + xfs_dq_logitem_t q_logitem; + /* total regular nblks used+reserved */ + xfs_qcnt_t q_res_bcount; + /* total inos allocd+reserved */ + xfs_qcnt_t q_res_icount; + /* total realtime blks used+reserved */ + xfs_qcnt_t q_res_rtbcount; + xfs_qcnt_t q_prealloc_lo_wmark; + xfs_qcnt_t q_prealloc_hi_wmark; + int64_t q_low_space[XFS_QLOWSP_MAX]; + struct mutex q_qlock; + struct completion q_flush; + atomic_t q_pincount; + struct wait_queue_head q_pinwait; +};
/* * Lock hierarchy for q_qlock: * XFS_QLOCK_NORMAL is the implicit default, - * XFS_QLOCK_NESTED is the dquot with the higher id in xfs_dqlock2 + * XFS_QLOCK_NESTED is the dquot with the higher id in xfs_dqlock2 */ enum { XFS_QLOCK_NORMAL = 0, @@ -64,21 +67,21 @@ enum { };
/* - * Manage the q_flush completion queue embedded in the dquot. This completion + * Manage the q_flush completion queue embedded in the dquot. This completion * queue synchronizes processes attempting to flush the in-core dquot back to * disk. */ -static inline void xfs_dqflock(xfs_dquot_t *dqp) +static inline void xfs_dqflock(struct xfs_dquot *dqp) { wait_for_completion(&dqp->q_flush); }
-static inline bool xfs_dqflock_nowait(xfs_dquot_t *dqp) +static inline bool xfs_dqflock_nowait(struct xfs_dquot *dqp) { return try_wait_for_completion(&dqp->q_flush); }
-static inline void xfs_dqfunlock(xfs_dquot_t *dqp) +static inline void xfs_dqfunlock(struct xfs_dquot *dqp) { complete(&dqp->q_flush); } @@ -112,7 +115,7 @@ static inline int xfs_this_quota_on(stru } }
-static inline xfs_dquot_t *xfs_inode_dquot(struct xfs_inode *ip, int type) +static inline struct xfs_dquot *xfs_inode_dquot(struct xfs_inode *ip, int type) { switch (type & XFS_DQ_ALLTYPES) { case XFS_DQ_USER: @@ -147,31 +150,30 @@ static inline bool xfs_dquot_lowsp(struc #define XFS_QM_ISPDQ(dqp) ((dqp)->dq_flags & XFS_DQ_PROJ) #define XFS_QM_ISGDQ(dqp) ((dqp)->dq_flags & XFS_DQ_GROUP)
-extern void xfs_qm_dqdestroy(xfs_dquot_t *); -extern int xfs_qm_dqflush(struct xfs_dquot *, struct xfs_buf **); -extern void xfs_qm_dqunpin_wait(xfs_dquot_t *); -extern void xfs_qm_adjust_dqtimers(xfs_mount_t *, - xfs_disk_dquot_t *); -extern void xfs_qm_adjust_dqlimits(struct xfs_mount *, - struct xfs_dquot *); -extern xfs_dqid_t xfs_qm_id_for_quotatype(struct xfs_inode *ip, - uint type); -extern int xfs_qm_dqget(struct xfs_mount *mp, xfs_dqid_t id, +void xfs_qm_dqdestroy(struct xfs_dquot *dqp); +int xfs_qm_dqflush(struct xfs_dquot *dqp, struct xfs_buf **bpp); +void xfs_qm_dqunpin_wait(struct xfs_dquot *dqp); +void xfs_qm_adjust_dqtimers(struct xfs_mount *mp, + struct xfs_disk_dquot *d); +void xfs_qm_adjust_dqlimits(struct xfs_mount *mp, + struct xfs_dquot *d); +xfs_dqid_t xfs_qm_id_for_quotatype(struct xfs_inode *ip, uint type); +int xfs_qm_dqget(struct xfs_mount *mp, xfs_dqid_t id, uint type, bool can_alloc, struct xfs_dquot **dqpp); -extern int xfs_qm_dqget_inode(struct xfs_inode *ip, uint type, - bool can_alloc, - struct xfs_dquot **dqpp); -extern int xfs_qm_dqget_next(struct xfs_mount *mp, xfs_dqid_t id, +int xfs_qm_dqget_inode(struct xfs_inode *ip, uint type, + bool can_alloc, + struct xfs_dquot **dqpp); +int xfs_qm_dqget_next(struct xfs_mount *mp, xfs_dqid_t id, uint type, struct xfs_dquot **dqpp); -extern int xfs_qm_dqget_uncached(struct xfs_mount *mp, - xfs_dqid_t id, uint type, - struct xfs_dquot **dqpp); -extern void xfs_qm_dqput(xfs_dquot_t *); +int xfs_qm_dqget_uncached(struct xfs_mount *mp, + xfs_dqid_t id, uint type, + struct xfs_dquot **dqpp); +void xfs_qm_dqput(struct xfs_dquot *dqp);
-extern void xfs_dqlock2(struct xfs_dquot *, struct xfs_dquot *); +void xfs_dqlock2(struct xfs_dquot *, struct xfs_dquot *);
-extern void xfs_dquot_set_prealloc_limits(struct xfs_dquot *); +void xfs_dquot_set_prealloc_limits(struct xfs_dquot *);
static inline struct xfs_dquot *xfs_qm_dqhold(struct xfs_dquot *dqp) { --- a/fs/xfs/xfs_log_recover.c +++ b/fs/xfs/xfs_log_recover.c @@ -2577,6 +2577,7 @@ xlog_recover_do_reg_buffer( int bit; int nbits; xfs_failaddr_t fa; + const size_t size_disk_dquot = sizeof(struct xfs_disk_dquot);
trace_xfs_log_recover_buf_reg_buf(mp->m_log, buf_f);
@@ -2619,7 +2620,7 @@ xlog_recover_do_reg_buffer( "XFS: NULL dquot in %s.", __func__); goto next; } - if (item->ri_buf[i].i_len < sizeof(xfs_disk_dquot_t)) { + if (item->ri_buf[i].i_len < size_disk_dquot) { xfs_alert(mp, "XFS: dquot too small (%d) in %s.", item->ri_buf[i].i_len, __func__); @@ -3250,7 +3251,7 @@ xlog_recover_dquot_pass2( xfs_alert(log->l_mp, "NULL dquot in %s.", __func__); return -EFSCORRUPTED; } - if (item->ri_buf[1].i_len < sizeof(xfs_disk_dquot_t)) { + if (item->ri_buf[1].i_len < sizeof(struct xfs_disk_dquot)) { xfs_alert(log->l_mp, "dquot too small (%d) in %s.", item->ri_buf[1].i_len, __func__); return -EFSCORRUPTED; --- a/fs/xfs/xfs_qm.c +++ b/fs/xfs/xfs_qm.c @@ -244,14 +244,14 @@ xfs_qm_unmount_quotas(
STATIC int xfs_qm_dqattach_one( - xfs_inode_t *ip, - xfs_dqid_t id, - uint type, - bool doalloc, - xfs_dquot_t **IO_idqpp) + struct xfs_inode *ip, + xfs_dqid_t id, + uint type, + bool doalloc, + struct xfs_dquot **IO_idqpp) { - xfs_dquot_t *dqp; - int error; + struct xfs_dquot *dqp; + int error;
ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL)); error = 0; @@ -544,8 +544,8 @@ xfs_qm_set_defquota( uint type, xfs_quotainfo_t *qinf) { - xfs_dquot_t *dqp; - struct xfs_def_quota *defq; + struct xfs_dquot *dqp; + struct xfs_def_quota *defq; struct xfs_disk_dquot *ddqp; int error;
@@ -1746,14 +1746,14 @@ error_rele: * Actually transfer ownership, and do dquot modifications. * These were already reserved. */ -xfs_dquot_t * +struct xfs_dquot * xfs_qm_vop_chown( - xfs_trans_t *tp, - xfs_inode_t *ip, - xfs_dquot_t **IO_olddq, - xfs_dquot_t *newdq) + struct xfs_trans *tp, + struct xfs_inode *ip, + struct xfs_dquot **IO_olddq, + struct xfs_dquot *newdq) { - xfs_dquot_t *prevdq; + struct xfs_dquot *prevdq; uint bfield = XFS_IS_REALTIME_INODE(ip) ? XFS_TRANS_DQ_RTBCOUNT : XFS_TRANS_DQ_BCOUNT;
--- a/fs/xfs/xfs_qm_bhv.c +++ b/fs/xfs/xfs_qm_bhv.c @@ -54,11 +54,11 @@ xfs_fill_statvfs_from_dquot( */ void xfs_qm_statvfs( - xfs_inode_t *ip, + struct xfs_inode *ip, struct kstatfs *statp) { - xfs_mount_t *mp = ip->i_mount; - xfs_dquot_t *dqp; + struct xfs_mount *mp = ip->i_mount; + struct xfs_dquot *dqp;
if (!xfs_qm_dqget(mp, xfs_get_projid(ip), XFS_DQ_PROJ, false, &dqp)) { xfs_fill_statvfs_from_dquot(statp, dqp); --- a/fs/xfs/xfs_trans_dquot.c +++ b/fs/xfs/xfs_trans_dquot.c @@ -25,8 +25,8 @@ STATIC void xfs_trans_alloc_dqinfo(xfs_t */ void xfs_trans_dqjoin( - xfs_trans_t *tp, - xfs_dquot_t *dqp) + struct xfs_trans *tp, + struct xfs_dquot *dqp) { ASSERT(XFS_DQ_IS_LOCKED(dqp)); ASSERT(dqp->q_logitem.qli_dquot == dqp); @@ -49,8 +49,8 @@ xfs_trans_dqjoin( */ void xfs_trans_log_dquot( - xfs_trans_t *tp, - xfs_dquot_t *dqp) + struct xfs_trans *tp, + struct xfs_dquot *dqp) { ASSERT(XFS_DQ_IS_LOCKED(dqp));
@@ -486,12 +486,12 @@ xfs_trans_apply_dquot_deltas( */ void xfs_trans_unreserve_and_mod_dquots( - xfs_trans_t *tp) + struct xfs_trans *tp) { int i, j; - xfs_dquot_t *dqp; + struct xfs_dquot *dqp; struct xfs_dqtrx *qtrx, *qa; - bool locked; + bool locked;
if (!tp->t_dqinfo || !(tp->t_flags & XFS_TRANS_DQ_DIRTY)) return; @@ -571,21 +571,21 @@ xfs_quota_warn( */ STATIC int xfs_trans_dqresv( - xfs_trans_t *tp, - xfs_mount_t *mp, - xfs_dquot_t *dqp, - int64_t nblks, - long ninos, - uint flags) -{ - xfs_qcnt_t hardlimit; - xfs_qcnt_t softlimit; - time_t timer; - xfs_qwarncnt_t warns; - xfs_qwarncnt_t warnlimit; - xfs_qcnt_t total_count; - xfs_qcnt_t *resbcountp; - xfs_quotainfo_t *q = mp->m_quotainfo; + struct xfs_trans *tp, + struct xfs_mount *mp, + struct xfs_dquot *dqp, + int64_t nblks, + long ninos, + uint flags) +{ + xfs_qcnt_t hardlimit; + xfs_qcnt_t softlimit; + time_t timer; + xfs_qwarncnt_t warns; + xfs_qwarncnt_t warnlimit; + xfs_qcnt_t total_count; + xfs_qcnt_t *resbcountp; + xfs_quotainfo_t *q = mp->m_quotainfo; struct xfs_def_quota *defq;
From: Pavel Reichl preichl@redhat.com
commit fd8b81dbbb23d4a3508cfac83256b4f5e770941c upstream.
Signed-off-by: Pavel Reichl preichl@redhat.com Reviewed-by: Darrick J. Wong darrick.wong@oracle.com Signed-off-by: Darrick J. Wong darrick.wong@oracle.com Acked-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Chandan Babu R chandan.babu@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/xfs/xfs_dquot.c | 2 +- fs/xfs/xfs_dquot.h | 2 +- fs/xfs/xfs_dquot_item.h | 10 +++++----- 3 files changed, 7 insertions(+), 7 deletions(-)
--- a/fs/xfs/xfs_dquot.c +++ b/fs/xfs/xfs_dquot.c @@ -1018,7 +1018,7 @@ xfs_qm_dqflush_done( struct xfs_buf *bp, struct xfs_log_item *lip) { - xfs_dq_logitem_t *qip = (struct xfs_dq_logitem *)lip; + struct xfs_dq_logitem *qip = (struct xfs_dq_logitem *)lip; struct xfs_dquot *dqp = qip->qli_dquot; struct xfs_ail *ailp = lip->li_ailp;
--- a/fs/xfs/xfs_dquot.h +++ b/fs/xfs/xfs_dquot.h @@ -40,7 +40,7 @@ struct xfs_dquot { xfs_fileoff_t q_fileoffset;
struct xfs_disk_dquot q_core; - xfs_dq_logitem_t q_logitem; + struct xfs_dq_logitem q_logitem; /* total regular nblks used+reserved */ xfs_qcnt_t q_res_bcount; /* total inos allocd+reserved */ --- a/fs/xfs/xfs_dquot_item.h +++ b/fs/xfs/xfs_dquot_item.h @@ -11,11 +11,11 @@ struct xfs_trans; struct xfs_mount; struct xfs_qoff_logitem;
-typedef struct xfs_dq_logitem { - struct xfs_log_item qli_item; /* common portion */ - struct xfs_dquot *qli_dquot; /* dquot ptr */ - xfs_lsn_t qli_flush_lsn; /* lsn at last flush */ -} xfs_dq_logitem_t; +struct xfs_dq_logitem { + struct xfs_log_item qli_item; /* common portion */ + struct xfs_dquot *qli_dquot; /* dquot ptr */ + xfs_lsn_t qli_flush_lsn; /* lsn at last flush */ +};
typedef struct xfs_qoff_logitem { struct xfs_log_item qql_item; /* common portion */
From: Pavel Reichl preichl@redhat.com
commit d0bdfb106907e4a3ef4f25f6d27e392abf41f3a0 upstream.
Signed-off-by: Pavel Reichl preichl@redhat.com Reviewed-by: Darrick J. Wong darrick.wong@oracle.com [darrick: fix a comment] Signed-off-by: Darrick J. Wong darrick.wong@oracle.com Acked-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Chandan Babu R chandan.babu@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/xfs/libxfs/xfs_trans_resv.c | 4 ++-- fs/xfs/xfs_dquot_item.h | 28 +++++++++++++++------------- fs/xfs/xfs_qm_syscalls.c | 29 ++++++++++++++++------------- fs/xfs/xfs_trans_dquot.c | 12 ++++++------ 4 files changed, 39 insertions(+), 34 deletions(-)
--- a/fs/xfs/libxfs/xfs_trans_resv.c +++ b/fs/xfs/libxfs/xfs_trans_resv.c @@ -800,7 +800,7 @@ xfs_calc_qm_dqalloc_reservation(
/* * Turning off quotas. - * the xfs_qoff_logitem_t: sizeof(struct xfs_qoff_logitem) * 2 + * the quota off logitems: sizeof(struct xfs_qoff_logitem) * 2 * the superblock for the quota flags: sector size */ STATIC uint @@ -813,7 +813,7 @@ xfs_calc_qm_quotaoff_reservation(
/* * End of turning off quotas. - * the xfs_qoff_logitem_t: sizeof(struct xfs_qoff_logitem) * 2 + * the quota off logitems: sizeof(struct xfs_qoff_logitem) * 2 */ STATIC uint xfs_calc_qm_quotaoff_end_reservation(void) --- a/fs/xfs/xfs_dquot_item.h +++ b/fs/xfs/xfs_dquot_item.h @@ -12,24 +12,26 @@ struct xfs_mount; struct xfs_qoff_logitem;
struct xfs_dq_logitem { - struct xfs_log_item qli_item; /* common portion */ + struct xfs_log_item qli_item; /* common portion */ struct xfs_dquot *qli_dquot; /* dquot ptr */ - xfs_lsn_t qli_flush_lsn; /* lsn at last flush */ + xfs_lsn_t qli_flush_lsn; /* lsn at last flush */ };
-typedef struct xfs_qoff_logitem { - struct xfs_log_item qql_item; /* common portion */ - struct xfs_qoff_logitem *qql_start_lip; /* qoff-start logitem, if any */ +struct xfs_qoff_logitem { + struct xfs_log_item qql_item; /* common portion */ + struct xfs_qoff_logitem *qql_start_lip; /* qoff-start logitem, if any */ unsigned int qql_flags; -} xfs_qoff_logitem_t; +};
-extern void xfs_qm_dquot_logitem_init(struct xfs_dquot *); -extern xfs_qoff_logitem_t *xfs_qm_qoff_logitem_init(struct xfs_mount *, - struct xfs_qoff_logitem *, uint); -extern xfs_qoff_logitem_t *xfs_trans_get_qoff_item(struct xfs_trans *, - struct xfs_qoff_logitem *, uint); -extern void xfs_trans_log_quotaoff_item(struct xfs_trans *, - struct xfs_qoff_logitem *); +void xfs_qm_dquot_logitem_init(struct xfs_dquot *dqp); +struct xfs_qoff_logitem *xfs_qm_qoff_logitem_init(struct xfs_mount *mp, + struct xfs_qoff_logitem *start, + uint flags); +struct xfs_qoff_logitem *xfs_trans_get_qoff_item(struct xfs_trans *tp, + struct xfs_qoff_logitem *startqoff, + uint flags); +void xfs_trans_log_quotaoff_item(struct xfs_trans *tp, + struct xfs_qoff_logitem *qlp);
#endif /* __XFS_DQUOT_ITEM_H__ */ --- a/fs/xfs/xfs_qm_syscalls.c +++ b/fs/xfs/xfs_qm_syscalls.c @@ -19,9 +19,12 @@ #include "xfs_qm.h" #include "xfs_icache.h"
-STATIC int xfs_qm_log_quotaoff(xfs_mount_t *, xfs_qoff_logitem_t **, uint); -STATIC int xfs_qm_log_quotaoff_end(xfs_mount_t *, xfs_qoff_logitem_t *, - uint); +STATIC int xfs_qm_log_quotaoff(struct xfs_mount *mp, + struct xfs_qoff_logitem **qoffstartp, + uint flags); +STATIC int xfs_qm_log_quotaoff_end(struct xfs_mount *mp, + struct xfs_qoff_logitem *startqoff, + uint flags);
/* * Turn off quota accounting and/or enforcement for all udquots and/or @@ -40,7 +43,7 @@ xfs_qm_scall_quotaoff( uint dqtype; int error; uint inactivate_flags; - xfs_qoff_logitem_t *qoffstart; + struct xfs_qoff_logitem *qoffstart;
/* * No file system can have quotas enabled on disk but not in core. @@ -540,13 +543,13 @@ out_unlock:
STATIC int xfs_qm_log_quotaoff_end( - xfs_mount_t *mp, - xfs_qoff_logitem_t *startqoff, + struct xfs_mount *mp, + struct xfs_qoff_logitem *startqoff, uint flags) { - xfs_trans_t *tp; + struct xfs_trans *tp; int error; - xfs_qoff_logitem_t *qoffi; + struct xfs_qoff_logitem *qoffi;
error = xfs_trans_alloc(mp, &M_RES(mp)->tr_qm_equotaoff, 0, 0, 0, &tp); if (error) @@ -568,13 +571,13 @@ xfs_qm_log_quotaoff_end(
STATIC int xfs_qm_log_quotaoff( - xfs_mount_t *mp, - xfs_qoff_logitem_t **qoffstartp, - uint flags) + struct xfs_mount *mp, + struct xfs_qoff_logitem **qoffstartp, + uint flags) { - xfs_trans_t *tp; + struct xfs_trans *tp; int error; - xfs_qoff_logitem_t *qoffi; + struct xfs_qoff_logitem *qoffi;
*qoffstartp = NULL;
--- a/fs/xfs/xfs_trans_dquot.c +++ b/fs/xfs/xfs_trans_dquot.c @@ -824,13 +824,13 @@ xfs_trans_reserve_quota_nblks( /* * This routine is called to allocate a quotaoff log item. */ -xfs_qoff_logitem_t * +struct xfs_qoff_logitem * xfs_trans_get_qoff_item( - xfs_trans_t *tp, - xfs_qoff_logitem_t *startqoff, + struct xfs_trans *tp, + struct xfs_qoff_logitem *startqoff, uint flags) { - xfs_qoff_logitem_t *q; + struct xfs_qoff_logitem *q;
ASSERT(tp != NULL);
@@ -852,8 +852,8 @@ xfs_trans_get_qoff_item( */ void xfs_trans_log_quotaoff_item( - xfs_trans_t *tp, - xfs_qoff_logitem_t *qlp) + struct xfs_trans *tp, + struct xfs_qoff_logitem *qlp) { tp->t_flags |= XFS_TRANS_DIRTY; set_bit(XFS_LI_DIRTY, &qlp->qql_item.li_flags);
From: Pavel Reichl preichl@redhat.com
commit 1cc95e6f0d7cfd61c9d3c5cdd4e7345b173f764f upstream.
Signed-off-by: Pavel Reichl preichl@redhat.com Reviewed-by: Darrick J. Wong darrick.wong@oracle.com [darrick: fix typo in subject line] Signed-off-by: Darrick J. Wong darrick.wong@oracle.com Acked-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Chandan Babu R chandan.babu@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/xfs/xfs_qm_syscalls.c | 140 ++++++++++++++++++++++------------------------- 1 file changed, 66 insertions(+), 74 deletions(-)
--- a/fs/xfs/xfs_qm_syscalls.c +++ b/fs/xfs/xfs_qm_syscalls.c @@ -19,12 +19,72 @@ #include "xfs_qm.h" #include "xfs_icache.h"
-STATIC int xfs_qm_log_quotaoff(struct xfs_mount *mp, - struct xfs_qoff_logitem **qoffstartp, - uint flags); -STATIC int xfs_qm_log_quotaoff_end(struct xfs_mount *mp, - struct xfs_qoff_logitem *startqoff, - uint flags); +STATIC int +xfs_qm_log_quotaoff( + struct xfs_mount *mp, + struct xfs_qoff_logitem **qoffstartp, + uint flags) +{ + struct xfs_trans *tp; + int error; + struct xfs_qoff_logitem *qoffi; + + *qoffstartp = NULL; + + error = xfs_trans_alloc(mp, &M_RES(mp)->tr_qm_quotaoff, 0, 0, 0, &tp); + if (error) + goto out; + + qoffi = xfs_trans_get_qoff_item(tp, NULL, flags & XFS_ALL_QUOTA_ACCT); + xfs_trans_log_quotaoff_item(tp, qoffi); + + spin_lock(&mp->m_sb_lock); + mp->m_sb.sb_qflags = (mp->m_qflags & ~(flags)) & XFS_MOUNT_QUOTA_ALL; + spin_unlock(&mp->m_sb_lock); + + xfs_log_sb(tp); + + /* + * We have to make sure that the transaction is secure on disk before we + * return and actually stop quota accounting. So, make it synchronous. + * We don't care about quotoff's performance. + */ + xfs_trans_set_sync(tp); + error = xfs_trans_commit(tp); + if (error) + goto out; + + *qoffstartp = qoffi; +out: + return error; +} + +STATIC int +xfs_qm_log_quotaoff_end( + struct xfs_mount *mp, + struct xfs_qoff_logitem *startqoff, + uint flags) +{ + struct xfs_trans *tp; + int error; + struct xfs_qoff_logitem *qoffi; + + error = xfs_trans_alloc(mp, &M_RES(mp)->tr_qm_equotaoff, 0, 0, 0, &tp); + if (error) + return error; + + qoffi = xfs_trans_get_qoff_item(tp, startqoff, + flags & XFS_ALL_QUOTA_ACCT); + xfs_trans_log_quotaoff_item(tp, qoffi); + + /* + * We have to make sure that the transaction is secure on disk before we + * return and actually stop quota accounting. So, make it synchronous. + * We don't care about quotoff's performance. + */ + xfs_trans_set_sync(tp); + return xfs_trans_commit(tp); +}
/* * Turn off quota accounting and/or enforcement for all udquots and/or @@ -541,74 +601,6 @@ out_unlock: return error; }
-STATIC int -xfs_qm_log_quotaoff_end( - struct xfs_mount *mp, - struct xfs_qoff_logitem *startqoff, - uint flags) -{ - struct xfs_trans *tp; - int error; - struct xfs_qoff_logitem *qoffi; - - error = xfs_trans_alloc(mp, &M_RES(mp)->tr_qm_equotaoff, 0, 0, 0, &tp); - if (error) - return error; - - qoffi = xfs_trans_get_qoff_item(tp, startqoff, - flags & XFS_ALL_QUOTA_ACCT); - xfs_trans_log_quotaoff_item(tp, qoffi); - - /* - * We have to make sure that the transaction is secure on disk before we - * return and actually stop quota accounting. So, make it synchronous. - * We don't care about quotoff's performance. - */ - xfs_trans_set_sync(tp); - return xfs_trans_commit(tp); -} - - -STATIC int -xfs_qm_log_quotaoff( - struct xfs_mount *mp, - struct xfs_qoff_logitem **qoffstartp, - uint flags) -{ - struct xfs_trans *tp; - int error; - struct xfs_qoff_logitem *qoffi; - - *qoffstartp = NULL; - - error = xfs_trans_alloc(mp, &M_RES(mp)->tr_qm_quotaoff, 0, 0, 0, &tp); - if (error) - goto out; - - qoffi = xfs_trans_get_qoff_item(tp, NULL, flags & XFS_ALL_QUOTA_ACCT); - xfs_trans_log_quotaoff_item(tp, qoffi); - - spin_lock(&mp->m_sb_lock); - mp->m_sb.sb_qflags = (mp->m_qflags & ~(flags)) & XFS_MOUNT_QUOTA_ALL; - spin_unlock(&mp->m_sb_lock); - - xfs_log_sb(tp); - - /* - * We have to make sure that the transaction is secure on disk before we - * return and actually stop quota accounting. So, make it synchronous. - * We don't care about quotoff's performance. - */ - xfs_trans_set_sync(tp); - error = xfs_trans_commit(tp); - if (error) - goto out; - - *qoffstartp = qoffi; -out: - return error; -} - /* Fill out the quota context. */ static void xfs_qm_scall_getquota_fill_qc(
From: Brian Foster bfoster@redhat.com
commit 854f82b1f6039a418b7d1407513f8640e05fd73f upstream.
AIL removal of the quotaoff start intent and free of both intents is hardcoded to the ->iop_committed() handler of the end intent. Factor out the start intent handling code so it can be used in a future patch to properly handle quotaoff errors. Use xfs_trans_ail_remove() instead of the _delete() variant to acquire the AIL lock and also handle cases where an intent might not reside in the AIL at the time of a failure.
Signed-off-by: Brian Foster bfoster@redhat.com Reviewed-by: Darrick J. Wong darrick.wong@oracle.com Signed-off-by: Darrick J. Wong darrick.wong@oracle.com Reviewed-by: Christoph Hellwig hch@lst.de Acked-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Chandan Babu R chandan.babu@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/xfs/xfs_dquot_item.c | 29 ++++++++++++++++++++--------- fs/xfs/xfs_dquot_item.h | 1 + 2 files changed, 21 insertions(+), 9 deletions(-)
--- a/fs/xfs/xfs_dquot_item.c +++ b/fs/xfs/xfs_dquot_item.c @@ -307,18 +307,10 @@ xfs_qm_qoffend_logitem_committed( { struct xfs_qoff_logitem *qfe = QOFF_ITEM(lip); struct xfs_qoff_logitem *qfs = qfe->qql_start_lip; - struct xfs_ail *ailp = qfs->qql_item.li_ailp;
- /* - * Delete the qoff-start logitem from the AIL. - * xfs_trans_ail_delete() drops the AIL lock. - */ - spin_lock(&ailp->ail_lock); - xfs_trans_ail_delete(ailp, &qfs->qql_item, SHUTDOWN_LOG_IO_ERROR); + xfs_qm_qoff_logitem_relse(qfs);
- kmem_free(qfs->qql_item.li_lv_shadow); kmem_free(lip->li_lv_shadow); - kmem_free(qfs); kmem_free(qfe); return (xfs_lsn_t)-1; } @@ -337,6 +329,25 @@ static const struct xfs_item_ops xfs_qm_ };
/* + * Delete the quotaoff intent from the AIL and free it. On success, + * this should only be called for the start item. It can be used for + * either on shutdown or abort. + */ +void +xfs_qm_qoff_logitem_relse( + struct xfs_qoff_logitem *qoff) +{ + struct xfs_log_item *lip = &qoff->qql_item; + + ASSERT(test_bit(XFS_LI_IN_AIL, &lip->li_flags) || + test_bit(XFS_LI_ABORTED, &lip->li_flags) || + XFS_FORCED_SHUTDOWN(lip->li_mountp)); + xfs_trans_ail_remove(lip, SHUTDOWN_LOG_IO_ERROR); + kmem_free(lip->li_lv_shadow); + kmem_free(qoff); +} + +/* * Allocate and initialize an quotaoff item of the correct quota type(s). */ struct xfs_qoff_logitem * --- a/fs/xfs/xfs_dquot_item.h +++ b/fs/xfs/xfs_dquot_item.h @@ -28,6 +28,7 @@ void xfs_qm_dquot_logitem_init(struct xf struct xfs_qoff_logitem *xfs_qm_qoff_logitem_init(struct xfs_mount *mp, struct xfs_qoff_logitem *start, uint flags); +void xfs_qm_qoff_logitem_relse(struct xfs_qoff_logitem *); struct xfs_qoff_logitem *xfs_trans_get_qoff_item(struct xfs_trans *tp, struct xfs_qoff_logitem *startqoff, uint flags);
From: Brian Foster bfoster@redhat.com
commit 8a62714313391b9b2297d67c341b35edbf46c279 upstream.
AIL removal of the quotaoff start intent and free of both quotaoff intents is currently limited to the ->iop_committed() handler of the end intent. This executes when the end intent is committed to the on-disk log and marks the completion of the operation. The problem with this is it assumes the success of the operation. If a shutdown or other error occurs during the quotaoff, it's possible for the quotaoff task to exit without removing the start intent from the AIL. This results in an unmount hang as the AIL cannot be emptied. Further, no other codepath frees the intents and so this is also a memory leak vector.
First, update the high level quotaoff error path to directly remove and free the quotaoff start intent if it still exists in the AIL at the time of the error. Next, update both of the start and end quotaoff intents with an ->iop_release() callback to properly handle transaction abort.
This means that If the quotaoff start transaction aborts, it frees the start intent in the transaction commit path. If the filesystem shuts down before the end transaction allocates, the quotaoff sequence removes and frees the start intent. If the end transaction aborts, it removes the start intent and frees both. This ensures that a shutdown does not result in a hung unmount and that memory is not leaked regardless of when a quotaoff error occurs.
Signed-off-by: Brian Foster bfoster@redhat.com Reviewed-by: Darrick J. Wong darrick.wong@oracle.com Signed-off-by: Darrick J. Wong darrick.wong@oracle.com Reviewed-by: Christoph Hellwig hch@lst.de Acked-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Chandan Babu R chandan.babu@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/xfs/xfs_dquot_item.c | 15 +++++++++++++++ fs/xfs/xfs_qm_syscalls.c | 13 +++++++------ 2 files changed, 22 insertions(+), 6 deletions(-)
--- a/fs/xfs/xfs_dquot_item.c +++ b/fs/xfs/xfs_dquot_item.c @@ -315,17 +315,32 @@ xfs_qm_qoffend_logitem_committed( return (xfs_lsn_t)-1; }
+STATIC void +xfs_qm_qoff_logitem_release( + struct xfs_log_item *lip) +{ + struct xfs_qoff_logitem *qoff = QOFF_ITEM(lip); + + if (test_bit(XFS_LI_ABORTED, &lip->li_flags)) { + if (qoff->qql_start_lip) + xfs_qm_qoff_logitem_relse(qoff->qql_start_lip); + xfs_qm_qoff_logitem_relse(qoff); + } +} + static const struct xfs_item_ops xfs_qm_qoffend_logitem_ops = { .iop_size = xfs_qm_qoff_logitem_size, .iop_format = xfs_qm_qoff_logitem_format, .iop_committed = xfs_qm_qoffend_logitem_committed, .iop_push = xfs_qm_qoff_logitem_push, + .iop_release = xfs_qm_qoff_logitem_release, };
static const struct xfs_item_ops xfs_qm_qoff_logitem_ops = { .iop_size = xfs_qm_qoff_logitem_size, .iop_format = xfs_qm_qoff_logitem_format, .iop_push = xfs_qm_qoff_logitem_push, + .iop_release = xfs_qm_qoff_logitem_release, };
/* --- a/fs/xfs/xfs_qm_syscalls.c +++ b/fs/xfs/xfs_qm_syscalls.c @@ -29,8 +29,6 @@ xfs_qm_log_quotaoff( int error; struct xfs_qoff_logitem *qoffi;
- *qoffstartp = NULL; - error = xfs_trans_alloc(mp, &M_RES(mp)->tr_qm_quotaoff, 0, 0, 0, &tp); if (error) goto out; @@ -62,7 +60,7 @@ out: STATIC int xfs_qm_log_quotaoff_end( struct xfs_mount *mp, - struct xfs_qoff_logitem *startqoff, + struct xfs_qoff_logitem **startqoff, uint flags) { struct xfs_trans *tp; @@ -73,9 +71,10 @@ xfs_qm_log_quotaoff_end( if (error) return error;
- qoffi = xfs_trans_get_qoff_item(tp, startqoff, + qoffi = xfs_trans_get_qoff_item(tp, *startqoff, flags & XFS_ALL_QUOTA_ACCT); xfs_trans_log_quotaoff_item(tp, qoffi); + *startqoff = NULL;
/* * We have to make sure that the transaction is secure on disk before we @@ -103,7 +102,7 @@ xfs_qm_scall_quotaoff( uint dqtype; int error; uint inactivate_flags; - struct xfs_qoff_logitem *qoffstart; + struct xfs_qoff_logitem *qoffstart = NULL;
/* * No file system can have quotas enabled on disk but not in core. @@ -228,7 +227,7 @@ xfs_qm_scall_quotaoff( * So, we have QUOTAOFF start and end logitems; the start * logitem won't get overwritten until the end logitem appears... */ - error = xfs_qm_log_quotaoff_end(mp, qoffstart, flags); + error = xfs_qm_log_quotaoff_end(mp, &qoffstart, flags); if (error) { /* We're screwed now. Shutdown is the only option. */ xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE); @@ -261,6 +260,8 @@ xfs_qm_scall_quotaoff( }
out_unlock: + if (error && qoffstart) + xfs_qm_qoff_logitem_relse(qoffstart); mutex_unlock(&q->qi_quotaofflock); return error; }
From: "Darrick J. Wong" darrick.wong@oracle.com
commit 5885539f0af371024d07afd14974bfdc3fff84c5 upstream.
When quotacheck runs, it zeroes all the timer fields in every dquot. Unfortunately, it also does this to the root dquot, which erases any preconfigured grace intervals and warning limits that the administrator may have set. Worse yet, the incore copies of those variables remain set. This cache coherence problem manifests itself as the grace interval mysteriously being reset back to the defaults at the /next/ mount.
Fix it by not resetting the root disk dquot's timer and warning fields.
Signed-off-by: Darrick J. Wong darrick.wong@oracle.com Reviewed-by: Dave Chinner dchinner@redhat.com Reviewed-by: Christoph Hellwig hch@lst.de Acked-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Chandan Babu R chandan.babu@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/xfs/xfs_qm.c | 20 ++++++++++++++------ 1 file changed, 14 insertions(+), 6 deletions(-)
--- a/fs/xfs/xfs_qm.c +++ b/fs/xfs/xfs_qm.c @@ -875,12 +875,20 @@ xfs_qm_reset_dqcounts( ddq->d_bcount = 0; ddq->d_icount = 0; ddq->d_rtbcount = 0; - ddq->d_btimer = 0; - ddq->d_itimer = 0; - ddq->d_rtbtimer = 0; - ddq->d_bwarns = 0; - ddq->d_iwarns = 0; - ddq->d_rtbwarns = 0; + + /* + * dquot id 0 stores the default grace period and the maximum + * warning limit that were set by the administrator, so we + * should not reset them. + */ + if (ddq->d_id != 0) { + ddq->d_btimer = 0; + ddq->d_itimer = 0; + ddq->d_rtbtimer = 0; + ddq->d_bwarns = 0; + ddq->d_iwarns = 0; + ddq->d_rtbwarns = 0; + }
if (xfs_sb_version_hascrc(&mp->m_sb)) { xfs_update_cksum((char *)&dqb[j],
From: Dave Chinner dchinner@redhat.com
commit 108a42358a05312b2128533c6462a3fdeb410bdf upstream.
The current CIL size aggregation limit is 1/8th the log size. This means for large logs we might be aggregating at least 250MB of dirty objects in memory before the CIL is flushed to the journal. With CIL shadow buffers sitting around, this means the CIL is often consuming >500MB of temporary memory that is all allocated under GFP_NOFS conditions.
Flushing the CIL can take some time to do if there is other IO ongoing, and can introduce substantial log force latency by itself. It also pins the memory until the objects are in the AIL and can be written back and reclaimed by shrinkers. Hence this threshold also tends to determine the minimum amount of memory XFS can operate in under heavy modification without triggering the OOM killer.
Modify the CIL space limit to prevent such huge amounts of pinned metadata from aggregating. We can have 2MB of log IO in flight at once, so limit aggregation to 16x this size. This threshold was chosen as it little impact on performance (on 16-way fsmark) or log traffic but pins a lot less memory on large logs especially under heavy memory pressure. An aggregation limit of 8x had 5-10% performance degradation and a 50% increase in log throughput for the same workload, so clearly that was too small for highly concurrent workloads on large logs.
This was found via trace analysis of AIL behaviour. e.g. insertion from a single CIL flush:
xfs_ail_insert: old lsn 0/0 new lsn 1/3033090 type XFS_LI_INODE flags IN_AIL
$ grep xfs_ail_insert /mnt/scratch/s.t |grep "new lsn 1/3033090" |wc -l 1721823 $
So there were 1.7 million objects inserted into the AIL from this CIL checkpoint, the first at 2323.392108, the last at 2325.667566 which was the end of the trace (i.e. it hadn't finished). Clearly a major problem.
Signed-off-by: Dave Chinner dchinner@redhat.com Reviewed-by: Brian Foster bfoster@redhat.com Reviewed-by: Allison Collins allison.henderson@oracle.com Reviewed-by: Darrick J. Wong darrick.wong@oracle.com Signed-off-by: Darrick J. Wong darrick.wong@oracle.com Acked-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Chandan Babu R chandan.babu@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/xfs/xfs_log_priv.h | 29 +++++++++++++++++++++++------ 1 file changed, 23 insertions(+), 6 deletions(-)
--- a/fs/xfs/xfs_log_priv.h +++ b/fs/xfs/xfs_log_priv.h @@ -323,13 +323,30 @@ struct xfs_cil { * tries to keep 25% of the log free, so we need to keep below that limit or we * risk running out of free log space to start any new transactions. * - * In order to keep background CIL push efficient, we will set a lower - * threshold at which background pushing is attempted without blocking current - * transaction commits. A separate, higher bound defines when CIL pushes are - * enforced to ensure we stay within our maximum checkpoint size bounds. - * threshold, yet give us plenty of space for aggregation on large logs. + * In order to keep background CIL push efficient, we only need to ensure the + * CIL is large enough to maintain sufficient in-memory relogging to avoid + * repeated physical writes of frequently modified metadata. If we allow the CIL + * to grow to a substantial fraction of the log, then we may be pinning hundreds + * of megabytes of metadata in memory until the CIL flushes. This can cause + * issues when we are running low on memory - pinned memory cannot be reclaimed, + * and the CIL consumes a lot of memory. Hence we need to set an upper physical + * size limit for the CIL that limits the maximum amount of memory pinned by the + * CIL but does not limit performance by reducing relogging efficiency + * significantly. + * + * As such, the CIL push threshold ends up being the smaller of two thresholds: + * - a threshold large enough that it allows CIL to be pushed and progress to be + * made without excessive blocking of incoming transaction commits. This is + * defined to be 12.5% of the log space - half the 25% push threshold of the + * AIL. + * - small enough that it doesn't pin excessive amounts of memory but maintains + * close to peak relogging efficiency. This is defined to be 16x the iclog + * buffer window (32MB) as measurements have shown this to be roughly the + * point of diminishing performance increases under highly concurrent + * modification workloads. */ -#define XLOG_CIL_SPACE_LIMIT(log) (log->l_logsize >> 3) +#define XLOG_CIL_SPACE_LIMIT(log) \ + min_t(int, (log)->l_logsize >> 3, BBTOB(XLOG_TOTAL_REC_SHIFT(log)) << 4)
/* * ticket grant locks, queues and accounting have their own cachlines
From: Dave Chinner dchinner@redhat.com
commit 0e7ab7efe77451cba4cbecb6c9f5ef83cf32b36b upstream.
In certain situations the background CIL push can be indefinitely delayed. While we have workarounds from the obvious cases now, it doesn't solve the underlying issue. This issue is that there is no upper limit on the CIL where we will either force or wait for a background push to start, hence allowing the CIL to grow without bound until it consumes all log space.
To fix this, add a new wait queue to the CIL which allows background pushes to wait for the CIL context to be switched out. This happens when the push starts, so it will allow us to block incoming transaction commit completion until the push has started. This will only affect processes that are running modifications, and only when the CIL threshold has been significantly overrun.
This has no apparent impact on performance, and doesn't even trigger until over 45 million inodes had been created in a 16-way fsmark test on a 2GB log. That was limiting at 64MB of log space used, so the active CIL size is only about 3% of the total log in that case. The concurrent removal of those files did not trigger the background sleep at all.
Signed-off-by: Dave Chinner dchinner@redhat.com Reviewed-by: Allison Collins allison.henderson@oracle.com Reviewed-by: Darrick J. Wong darrick.wong@oracle.com Signed-off-by: Darrick J. Wong darrick.wong@oracle.com Acked-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Chandan Babu R chandan.babu@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/xfs/xfs_log_cil.c | 37 +++++++++++++++++++++++++++++++++---- fs/xfs/xfs_log_priv.h | 24 ++++++++++++++++++++++++ fs/xfs/xfs_trace.h | 1 + 3 files changed, 58 insertions(+), 4 deletions(-)
--- a/fs/xfs/xfs_log_cil.c +++ b/fs/xfs/xfs_log_cil.c @@ -671,6 +671,11 @@ xlog_cil_push( ASSERT(push_seq <= ctx->sequence);
/* + * Wake up any background push waiters now this context is being pushed. + */ + wake_up_all(&ctx->push_wait); + + /* * Check if we've anything to push. If there is nothing, then we don't * move on to a new sequence number and so we have to be able to push * this sequence again later. @@ -746,6 +751,7 @@ xlog_cil_push( */ INIT_LIST_HEAD(&new_ctx->committing); INIT_LIST_HEAD(&new_ctx->busy_extents); + init_waitqueue_head(&new_ctx->push_wait); new_ctx->sequence = ctx->sequence + 1; new_ctx->cil = cil; cil->xc_ctx = new_ctx; @@ -900,7 +906,7 @@ xlog_cil_push_work( */ static void xlog_cil_push_background( - struct xlog *log) + struct xlog *log) __releases(cil->xc_ctx_lock) { struct xfs_cil *cil = log->l_cilp;
@@ -914,14 +920,36 @@ xlog_cil_push_background( * don't do a background push if we haven't used up all the * space available yet. */ - if (cil->xc_ctx->space_used < XLOG_CIL_SPACE_LIMIT(log)) + if (cil->xc_ctx->space_used < XLOG_CIL_SPACE_LIMIT(log)) { + up_read(&cil->xc_ctx_lock); return; + }
spin_lock(&cil->xc_push_lock); if (cil->xc_push_seq < cil->xc_current_sequence) { cil->xc_push_seq = cil->xc_current_sequence; queue_work(log->l_mp->m_cil_workqueue, &cil->xc_push_work); } + + /* + * Drop the context lock now, we can't hold that if we need to sleep + * because we are over the blocking threshold. The push_lock is still + * held, so blocking threshold sleep/wakeup is still correctly + * serialised here. + */ + up_read(&cil->xc_ctx_lock); + + /* + * If we are well over the space limit, throttle the work that is being + * done until the push work on this context has begun. + */ + if (cil->xc_ctx->space_used >= XLOG_CIL_BLOCKING_SPACE_LIMIT(log)) { + trace_xfs_log_cil_wait(log, cil->xc_ctx->ticket); + ASSERT(cil->xc_ctx->space_used < log->l_logsize); + xlog_wait(&cil->xc_ctx->push_wait, &cil->xc_push_lock); + return; + } + spin_unlock(&cil->xc_push_lock);
} @@ -1038,9 +1066,9 @@ xfs_log_commit_cil( if (lip->li_ops->iop_committing) lip->li_ops->iop_committing(lip, xc_commit_lsn); } - xlog_cil_push_background(log);
- up_read(&cil->xc_ctx_lock); + /* xlog_cil_push_background() releases cil->xc_ctx_lock */ + xlog_cil_push_background(log); }
/* @@ -1199,6 +1227,7 @@ xlog_cil_init(
INIT_LIST_HEAD(&ctx->committing); INIT_LIST_HEAD(&ctx->busy_extents); + init_waitqueue_head(&ctx->push_wait); ctx->sequence = 1; ctx->cil = cil; cil->xc_ctx = ctx; --- a/fs/xfs/xfs_log_priv.h +++ b/fs/xfs/xfs_log_priv.h @@ -247,6 +247,7 @@ struct xfs_cil_ctx { struct xfs_log_vec *lv_chain; /* logvecs being pushed */ struct list_head iclog_entry; struct list_head committing; /* ctx committing list */ + wait_queue_head_t push_wait; /* background push throttle */ struct work_struct discard_endio_work; };
@@ -344,10 +345,33 @@ struct xfs_cil { * buffer window (32MB) as measurements have shown this to be roughly the * point of diminishing performance increases under highly concurrent * modification workloads. + * + * To prevent the CIL from overflowing upper commit size bounds, we introduce a + * new threshold at which we block committing transactions until the background + * CIL commit commences and switches to a new context. While this is not a hard + * limit, it forces the process committing a transaction to the CIL to block and + * yeild the CPU, giving the CIL push work a chance to be scheduled and start + * work. This prevents a process running lots of transactions from overfilling + * the CIL because it is not yielding the CPU. We set the blocking limit at + * twice the background push space threshold so we keep in line with the AIL + * push thresholds. + * + * Note: this is not a -hard- limit as blocking is applied after the transaction + * is inserted into the CIL and the push has been triggered. It is largely a + * throttling mechanism that allows the CIL push to be scheduled and run. A hard + * limit will be difficult to implement without introducing global serialisation + * in the CIL commit fast path, and it's not at all clear that we actually need + * such hard limits given the ~7 years we've run without a hard limit before + * finding the first situation where a checkpoint size overflow actually + * occurred. Hence the simple throttle, and an ASSERT check to tell us that + * we've overrun the max size. */ #define XLOG_CIL_SPACE_LIMIT(log) \ min_t(int, (log)->l_logsize >> 3, BBTOB(XLOG_TOTAL_REC_SHIFT(log)) << 4)
+#define XLOG_CIL_BLOCKING_SPACE_LIMIT(log) \ + (XLOG_CIL_SPACE_LIMIT(log) * 2) + /* * ticket grant locks, queues and accounting have their own cachlines * as these are quite hot and can be operated on concurrently. --- a/fs/xfs/xfs_trace.h +++ b/fs/xfs/xfs_trace.h @@ -1011,6 +1011,7 @@ DEFINE_LOGGRANT_EVENT(xfs_log_regrant_re DEFINE_LOGGRANT_EVENT(xfs_log_ungrant_enter); DEFINE_LOGGRANT_EVENT(xfs_log_ungrant_exit); DEFINE_LOGGRANT_EVENT(xfs_log_ungrant_sub); +DEFINE_LOGGRANT_EVENT(xfs_log_cil_wait);
DECLARE_EVENT_CLASS(xfs_log_item_class, TP_PROTO(struct xfs_log_item *lip),
From: Dave Chinner dchinner@redhat.com
commit 4165994ac9672d91134675caa6de3645a9ace6c8 upstream.
Factor the common AIL deletion code that does all the wakeups into a helper so we only have one copy of this somewhat tricky code to interface with all the wakeups necessary when the LSN of the log tail changes.
Signed-off-by: Dave Chinner dchinner@redhat.com Reviewed-by: Christoph Hellwig hch@lst.de Reviewed-by: Allison Collins allison.henderson@oracle.com Reviewed-by: Brian Foster bfoster@redhat.com Reviewed-by: Darrick J. Wong darrick.wong@oracle.com Signed-off-by: Darrick J. Wong darrick.wong@oracle.com Acked-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Chandan Babu R chandan.babu@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/xfs/xfs_inode_item.c | 12 +----------- fs/xfs/xfs_trans_ail.c | 48 ++++++++++++++++++++++++++---------------------- fs/xfs/xfs_trans_priv.h | 4 +++- 3 files changed, 30 insertions(+), 34 deletions(-)
--- a/fs/xfs/xfs_inode_item.c +++ b/fs/xfs/xfs_inode_item.c @@ -744,17 +744,7 @@ xfs_iflush_done( xfs_clear_li_failed(blip); } } - - if (mlip_changed) { - if (!XFS_FORCED_SHUTDOWN(ailp->ail_mount)) - xlog_assign_tail_lsn_locked(ailp->ail_mount); - if (list_empty(&ailp->ail_head)) - wake_up_all(&ailp->ail_empty); - } - spin_unlock(&ailp->ail_lock); - - if (mlip_changed) - xfs_log_space_wake(ailp->ail_mount); + xfs_ail_update_finish(ailp, mlip_changed); }
/* --- a/fs/xfs/xfs_trans_ail.c +++ b/fs/xfs/xfs_trans_ail.c @@ -680,6 +680,27 @@ xfs_ail_push_all_sync( finish_wait(&ailp->ail_empty, &wait); }
+void +xfs_ail_update_finish( + struct xfs_ail *ailp, + bool do_tail_update) __releases(ailp->ail_lock) +{ + struct xfs_mount *mp = ailp->ail_mount; + + if (!do_tail_update) { + spin_unlock(&ailp->ail_lock); + return; + } + + if (!XFS_FORCED_SHUTDOWN(mp)) + xlog_assign_tail_lsn_locked(mp); + + if (list_empty(&ailp->ail_head)) + wake_up_all(&ailp->ail_empty); + spin_unlock(&ailp->ail_lock); + xfs_log_space_wake(mp); +} + /* * xfs_trans_ail_update - bulk AIL insertion operation. * @@ -739,15 +760,7 @@ xfs_trans_ail_update_bulk( if (!list_empty(&tmp)) xfs_ail_splice(ailp, cur, &tmp, lsn);
- if (mlip_changed) { - if (!XFS_FORCED_SHUTDOWN(ailp->ail_mount)) - xlog_assign_tail_lsn_locked(ailp->ail_mount); - spin_unlock(&ailp->ail_lock); - - xfs_log_space_wake(ailp->ail_mount); - } else { - spin_unlock(&ailp->ail_lock); - } + xfs_ail_update_finish(ailp, mlip_changed); }
bool @@ -791,10 +804,10 @@ void xfs_trans_ail_delete( struct xfs_ail *ailp, struct xfs_log_item *lip, - int shutdown_type) __releases(ailp->ail_lock) + int shutdown_type) { struct xfs_mount *mp = ailp->ail_mount; - bool mlip_changed; + bool need_update;
if (!test_bit(XFS_LI_IN_AIL, &lip->li_flags)) { spin_unlock(&ailp->ail_lock); @@ -807,17 +820,8 @@ xfs_trans_ail_delete( return; }
- mlip_changed = xfs_ail_delete_one(ailp, lip); - if (mlip_changed) { - if (!XFS_FORCED_SHUTDOWN(mp)) - xlog_assign_tail_lsn_locked(mp); - if (list_empty(&ailp->ail_head)) - wake_up_all(&ailp->ail_empty); - } - - spin_unlock(&ailp->ail_lock); - if (mlip_changed) - xfs_log_space_wake(ailp->ail_mount); + need_update = xfs_ail_delete_one(ailp, lip); + xfs_ail_update_finish(ailp, need_update); }
int --- a/fs/xfs/xfs_trans_priv.h +++ b/fs/xfs/xfs_trans_priv.h @@ -92,8 +92,10 @@ xfs_trans_ail_update( }
bool xfs_ail_delete_one(struct xfs_ail *ailp, struct xfs_log_item *lip); +void xfs_ail_update_finish(struct xfs_ail *ailp, bool do_tail_update) + __releases(ailp->ail_lock); void xfs_trans_ail_delete(struct xfs_ail *ailp, struct xfs_log_item *lip, - int shutdown_type) __releases(ailp->ail_lock); + int shutdown_type);
static inline void xfs_trans_ail_remove(
From: Dave Chinner dchinner@redhat.com
commit 8eb807bd839938b45bf7a97f0568d2a845ba6929 upstream.
We currently wake anything waiting on the log tail to move whenever the log item at the tail of the log is removed. Historically this was fine behaviour because there were very few items at any given LSN. But with delayed logging, there may be thousands of items at any given LSN, and we can't move the tail until they are all gone.
Hence if we are removing them in near tail-first order, we might be waking up processes waiting on the tail LSN to change (e.g. log space waiters) repeatedly without them being able to make progress. This also occurs with the new sync push waiters, and can result in thousands of spurious wakeups every second when under heavy direct reclaim pressure.
To fix this, check that the tail LSN has actually changed on the AIL before triggering wakeups. This will reduce the number of spurious wakeups when doing bulk AIL removal and make this code much more efficient.
Signed-off-by: Dave Chinner dchinner@redhat.com Reviewed-by: Christoph Hellwig hch@lst.de Reviewed-by: Brian Foster bfoster@redhat.com Reviewed-by: Allison Collins allison.henderson@oracle.com Reviewed-by: Darrick J. Wong darrick.wong@oracle.com Signed-off-by: Darrick J. Wong darrick.wong@oracle.com Acked-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Chandan Babu R chandan.babu@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/xfs/xfs_inode_item.c | 18 ++++++++++++---- fs/xfs/xfs_trans_ail.c | 52 +++++++++++++++++++++++++++++++++--------------- fs/xfs/xfs_trans_priv.h | 4 +-- 3 files changed, 51 insertions(+), 23 deletions(-)
--- a/fs/xfs/xfs_inode_item.c +++ b/fs/xfs/xfs_inode_item.c @@ -732,19 +732,27 @@ xfs_iflush_done( * holding the lock before removing the inode from the AIL. */ if (need_ail) { - bool mlip_changed = false; + xfs_lsn_t tail_lsn = 0;
/* this is an opencoded batch version of xfs_trans_ail_delete */ spin_lock(&ailp->ail_lock); list_for_each_entry(blip, &tmp, li_bio_list) { if (INODE_ITEM(blip)->ili_logged && - blip->li_lsn == INODE_ITEM(blip)->ili_flush_lsn) - mlip_changed |= xfs_ail_delete_one(ailp, blip); - else { + blip->li_lsn == INODE_ITEM(blip)->ili_flush_lsn) { + /* + * xfs_ail_update_finish() only cares about the + * lsn of the first tail item removed, any + * others will be at the same or higher lsn so + * we just ignore them. + */ + xfs_lsn_t lsn = xfs_ail_delete_one(ailp, blip); + if (!tail_lsn && lsn) + tail_lsn = lsn; + } else { xfs_clear_li_failed(blip); } } - xfs_ail_update_finish(ailp, mlip_changed); + xfs_ail_update_finish(ailp, tail_lsn); }
/* --- a/fs/xfs/xfs_trans_ail.c +++ b/fs/xfs/xfs_trans_ail.c @@ -108,17 +108,25 @@ xfs_ail_next( * We need the AIL lock in order to get a coherent read of the lsn of the last * item in the AIL. */ +static xfs_lsn_t +__xfs_ail_min_lsn( + struct xfs_ail *ailp) +{ + struct xfs_log_item *lip = xfs_ail_min(ailp); + + if (lip) + return lip->li_lsn; + return 0; +} + xfs_lsn_t xfs_ail_min_lsn( struct xfs_ail *ailp) { - xfs_lsn_t lsn = 0; - struct xfs_log_item *lip; + xfs_lsn_t lsn;
spin_lock(&ailp->ail_lock); - lip = xfs_ail_min(ailp); - if (lip) - lsn = lip->li_lsn; + lsn = __xfs_ail_min_lsn(ailp); spin_unlock(&ailp->ail_lock);
return lsn; @@ -683,11 +691,12 @@ xfs_ail_push_all_sync( void xfs_ail_update_finish( struct xfs_ail *ailp, - bool do_tail_update) __releases(ailp->ail_lock) + xfs_lsn_t old_lsn) __releases(ailp->ail_lock) { struct xfs_mount *mp = ailp->ail_mount;
- if (!do_tail_update) { + /* if the tail lsn hasn't changed, don't do updates or wakeups. */ + if (!old_lsn || old_lsn == __xfs_ail_min_lsn(ailp)) { spin_unlock(&ailp->ail_lock); return; } @@ -732,7 +741,7 @@ xfs_trans_ail_update_bulk( xfs_lsn_t lsn) __releases(ailp->ail_lock) { struct xfs_log_item *mlip; - int mlip_changed = 0; + xfs_lsn_t tail_lsn = 0; int i; LIST_HEAD(tmp);
@@ -747,9 +756,10 @@ xfs_trans_ail_update_bulk( continue;
trace_xfs_ail_move(lip, lip->li_lsn, lsn); + if (mlip == lip && !tail_lsn) + tail_lsn = lip->li_lsn; + xfs_ail_delete(ailp, lip); - if (mlip == lip) - mlip_changed = 1; } else { trace_xfs_ail_insert(lip, 0, lsn); } @@ -760,15 +770,23 @@ xfs_trans_ail_update_bulk( if (!list_empty(&tmp)) xfs_ail_splice(ailp, cur, &tmp, lsn);
- xfs_ail_update_finish(ailp, mlip_changed); + xfs_ail_update_finish(ailp, tail_lsn); }
-bool +/* + * Delete one log item from the AIL. + * + * If this item was at the tail of the AIL, return the LSN of the log item so + * that we can use it to check if the LSN of the tail of the log has moved + * when finishing up the AIL delete process in xfs_ail_update_finish(). + */ +xfs_lsn_t xfs_ail_delete_one( struct xfs_ail *ailp, struct xfs_log_item *lip) { struct xfs_log_item *mlip = xfs_ail_min(ailp); + xfs_lsn_t lsn = lip->li_lsn;
trace_xfs_ail_delete(lip, mlip->li_lsn, lip->li_lsn); xfs_ail_delete(ailp, lip); @@ -776,7 +794,9 @@ xfs_ail_delete_one( clear_bit(XFS_LI_IN_AIL, &lip->li_flags); lip->li_lsn = 0;
- return mlip == lip; + if (mlip == lip) + return lsn; + return 0; }
/** @@ -807,7 +827,7 @@ xfs_trans_ail_delete( int shutdown_type) { struct xfs_mount *mp = ailp->ail_mount; - bool need_update; + xfs_lsn_t tail_lsn;
if (!test_bit(XFS_LI_IN_AIL, &lip->li_flags)) { spin_unlock(&ailp->ail_lock); @@ -820,8 +840,8 @@ xfs_trans_ail_delete( return; }
- need_update = xfs_ail_delete_one(ailp, lip); - xfs_ail_update_finish(ailp, need_update); + tail_lsn = xfs_ail_delete_one(ailp, lip); + xfs_ail_update_finish(ailp, tail_lsn); }
int --- a/fs/xfs/xfs_trans_priv.h +++ b/fs/xfs/xfs_trans_priv.h @@ -91,8 +91,8 @@ xfs_trans_ail_update( xfs_trans_ail_update_bulk(ailp, NULL, &lip, 1, lsn); }
-bool xfs_ail_delete_one(struct xfs_ail *ailp, struct xfs_log_item *lip); -void xfs_ail_update_finish(struct xfs_ail *ailp, bool do_tail_update) +xfs_lsn_t xfs_ail_delete_one(struct xfs_ail *ailp, struct xfs_log_item *lip); +void xfs_ail_update_finish(struct xfs_ail *ailp, xfs_lsn_t old_lsn) __releases(ailp->ail_lock); void xfs_trans_ail_delete(struct xfs_ail *ailp, struct xfs_log_item *lip, int shutdown_type);
From: "Darrick J. Wong" darrick.wong@oracle.com
commit 5cc3c006eb45524860c4d1dd4dd7ad4a506bf3f5 upstream.
[ Modify fs/xfs/xfs_log.c to include the changes at locations suitable for 5.4-lts kernel ]
In commit f467cad95f5e3, I added the ability to force a recalculation of the filesystem summary counters if they seemed incorrect. This was done (not entirely correctly) by tweaking the log code to write an unmount record without the UMOUNT_TRANS flag set. At next mount, the log recovery code will fail to find the unmount record and go into recovery, which triggers the recalculation.
What actually gets written to the log is what ought to be an unmount record, but without any flags set to indicate what kind of record it actually is. This worked to trigger the recalculation, but we shouldn't write bogus log records when we could simply write nothing.
Fixes: f467cad95f5e3 ("xfs: force summary counter recalc at next mount") Signed-off-by: Darrick J. Wong darrick.wong@oracle.com Reviewed-by: Christoph Hellwig hch@lst.de Reviewed-by: Brian Foster bfoster@redhat.com Acked-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Chandan Babu R chandan.babu@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/xfs/xfs_log.c | 26 +++++++++++++------------- 1 file changed, 13 insertions(+), 13 deletions(-)
--- a/fs/xfs/xfs_log.c +++ b/fs/xfs/xfs_log.c @@ -837,19 +837,6 @@ xfs_log_write_unmount_record( if (error) goto out_err;
- /* - * If we think the summary counters are bad, clear the unmount header - * flag in the unmount record so that the summary counters will be - * recalculated during log recovery at next mount. Refer to - * xlog_check_unmount_rec for more details. - */ - if (XFS_TEST_ERROR(xfs_fs_has_sickness(mp, XFS_SICK_FS_COUNTERS), mp, - XFS_ERRTAG_FORCE_SUMMARY_RECALC)) { - xfs_alert(mp, "%s: will fix summary counters at next mount", - __func__); - flags &= ~XLOG_UNMOUNT_TRANS; - } - /* remove inited flag, and account for space used */ tic->t_flags = 0; tic->t_curr_res -= sizeof(magic); @@ -932,6 +919,19 @@ xfs_log_unmount_write(xfs_mount_t *mp) } while (iclog != first_iclog); #endif if (! (XLOG_FORCED_SHUTDOWN(log))) { + /* + * If we think the summary counters are bad, avoid writing the + * unmount record to force log recovery at next mount, after + * which the summary counters will be recalculated. Refer to + * xlog_check_unmount_rec for more details. + */ + if (XFS_TEST_ERROR(xfs_fs_has_sickness(mp, XFS_SICK_FS_COUNTERS), + mp, XFS_ERRTAG_FORCE_SUMMARY_RECALC)) { + xfs_alert(mp, + "%s: will fix summary counters at next mount", + __func__); + return 0; + } xfs_log_write_unmount_record(mp); } else { /*
From: Brian Foster bfoster@redhat.com
commit 8d3d7e2b35ea7d91d6e085c93b5efecfb0fba307 upstream.
A dquot flush currently blocks on the buffer lock for the underlying dquot buffer. In turn, this causes xfsaild to block rather than continue processing other items in the meantime. Update xfs_qm_dqflush() to trylock the buffer, similar to how inode buffers are handled, and return -EAGAIN if the lock fails. Fix up any callers that don't currently handle the error properly.
Signed-off-by: Brian Foster bfoster@redhat.com Reviewed-by: Christoph Hellwig hch@lst.de Reviewed-by: Darrick J. Wong darrick.wong@oracle.com Signed-off-by: Darrick J. Wong darrick.wong@oracle.com Acked-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Chandan Babu R chandan.babu@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/xfs/xfs_dquot.c | 6 +++--- fs/xfs/xfs_dquot_item.c | 3 ++- fs/xfs/xfs_qm.c | 14 +++++++++----- 3 files changed, 14 insertions(+), 9 deletions(-)
--- a/fs/xfs/xfs_dquot.c +++ b/fs/xfs/xfs_dquot.c @@ -1105,8 +1105,8 @@ xfs_qm_dqflush( * Get the buffer containing the on-disk dquot */ error = xfs_trans_read_buf(mp, NULL, mp->m_ddev_targp, dqp->q_blkno, - mp->m_quotainfo->qi_dqchunklen, 0, &bp, - &xfs_dquot_buf_ops); + mp->m_quotainfo->qi_dqchunklen, XBF_TRYLOCK, + &bp, &xfs_dquot_buf_ops); if (error) goto out_unlock;
@@ -1176,7 +1176,7 @@ xfs_qm_dqflush(
out_unlock: xfs_dqfunlock(dqp); - return -EIO; + return error; }
/* --- a/fs/xfs/xfs_dquot_item.c +++ b/fs/xfs/xfs_dquot_item.c @@ -189,7 +189,8 @@ xfs_qm_dquot_logitem_push( if (!xfs_buf_delwri_queue(bp, buffer_list)) rval = XFS_ITEM_FLUSHING; xfs_buf_relse(bp); - } + } else if (error == -EAGAIN) + rval = XFS_ITEM_LOCKED;
spin_lock(&lip->li_ailp->ail_lock); out_unlock: --- a/fs/xfs/xfs_qm.c +++ b/fs/xfs/xfs_qm.c @@ -121,12 +121,11 @@ xfs_qm_dqpurge( { struct xfs_mount *mp = dqp->q_mount; struct xfs_quotainfo *qi = mp->m_quotainfo; + int error = -EAGAIN;
xfs_dqlock(dqp); - if ((dqp->dq_flags & XFS_DQ_FREEING) || dqp->q_nrefs != 0) { - xfs_dqunlock(dqp); - return -EAGAIN; - } + if ((dqp->dq_flags & XFS_DQ_FREEING) || dqp->q_nrefs != 0) + goto out_unlock;
dqp->dq_flags |= XFS_DQ_FREEING;
@@ -139,7 +138,6 @@ xfs_qm_dqpurge( */ if (XFS_DQ_IS_DIRTY(dqp)) { struct xfs_buf *bp = NULL; - int error;
/* * We don't care about getting disk errors here. We need @@ -149,6 +147,8 @@ xfs_qm_dqpurge( if (!error) { error = xfs_bwrite(bp); xfs_buf_relse(bp); + } else if (error == -EAGAIN) { + goto out_unlock; } xfs_dqflock(dqp); } @@ -174,6 +174,10 @@ xfs_qm_dqpurge(
xfs_qm_dqdestroy(dqp); return 0; + +out_unlock: + xfs_dqunlock(dqp); + return error; }
/*
From: Christoph Hellwig hch@lst.de
commit 54fbdd1035e3a4e4f4082c335b095426cdefd092 upstream.
Create a new helper to force the log up to the last LSN touching an inode.
Signed-off-by: Christoph Hellwig hch@lst.de Reviewed-by: Brian Foster bfoster@redhat.com Reviewed-by: Darrick J. Wong darrick.wong@oracle.com Signed-off-by: Darrick J. Wong darrick.wong@oracle.com Acked-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Chandan Babu R chandan.babu@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/xfs/xfs_export.c | 14 +------------- fs/xfs/xfs_file.c | 12 +----------- fs/xfs/xfs_inode.c | 19 +++++++++++++++++++ fs/xfs/xfs_inode.h | 1 + 4 files changed, 22 insertions(+), 24 deletions(-)
--- a/fs/xfs/xfs_export.c +++ b/fs/xfs/xfs_export.c @@ -15,7 +15,6 @@ #include "xfs_trans.h" #include "xfs_inode_item.h" #include "xfs_icache.h" -#include "xfs_log.h" #include "xfs_pnfs.h"
/* @@ -221,18 +220,7 @@ STATIC int xfs_fs_nfs_commit_metadata( struct inode *inode) { - struct xfs_inode *ip = XFS_I(inode); - struct xfs_mount *mp = ip->i_mount; - xfs_lsn_t lsn = 0; - - xfs_ilock(ip, XFS_ILOCK_SHARED); - if (xfs_ipincount(ip)) - lsn = ip->i_itemp->ili_last_lsn; - xfs_iunlock(ip, XFS_ILOCK_SHARED); - - if (!lsn) - return 0; - return xfs_log_force_lsn(mp, lsn, XFS_LOG_SYNC, NULL); + return xfs_log_force_inode(XFS_I(inode)); }
const struct export_operations xfs_export_operations = { --- a/fs/xfs/xfs_file.c +++ b/fs/xfs/xfs_file.c @@ -80,19 +80,9 @@ xfs_dir_fsync( int datasync) { struct xfs_inode *ip = XFS_I(file->f_mapping->host); - struct xfs_mount *mp = ip->i_mount; - xfs_lsn_t lsn = 0;
trace_xfs_dir_fsync(ip); - - xfs_ilock(ip, XFS_ILOCK_SHARED); - if (xfs_ipincount(ip)) - lsn = ip->i_itemp->ili_last_lsn; - xfs_iunlock(ip, XFS_ILOCK_SHARED); - - if (!lsn) - return 0; - return xfs_log_force_lsn(mp, lsn, XFS_LOG_SYNC, NULL); + return xfs_log_force_inode(ip); }
STATIC int --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -3973,3 +3973,22 @@ xfs_irele( trace_xfs_irele(ip, _RET_IP_); iput(VFS_I(ip)); } + +/* + * Ensure all commited transactions touching the inode are written to the log. + */ +int +xfs_log_force_inode( + struct xfs_inode *ip) +{ + xfs_lsn_t lsn = 0; + + xfs_ilock(ip, XFS_ILOCK_SHARED); + if (xfs_ipincount(ip)) + lsn = ip->i_itemp->ili_last_lsn; + xfs_iunlock(ip, XFS_ILOCK_SHARED); + + if (!lsn) + return 0; + return xfs_log_force_lsn(ip->i_mount, lsn, XFS_LOG_SYNC, NULL); +} --- a/fs/xfs/xfs_inode.h +++ b/fs/xfs/xfs_inode.h @@ -441,6 +441,7 @@ int xfs_itruncate_extents_flags(struct struct xfs_inode *, int, xfs_fsize_t, int); void xfs_iext_realloc(xfs_inode_t *, int, int);
+int xfs_log_force_inode(struct xfs_inode *ip); void xfs_iunpin_wait(xfs_inode_t *); #define xfs_ipincount(ip) ((unsigned int) atomic_read(&ip->i_pincount))
From: Christoph Hellwig hch@lst.de
commit 5833112df7e9a306af9af09c60127b92ed723962 upstream.
Reflink should force the log out to disk if the filesystem was mounted with wsync, the same as most other operations in xfs.
[Note: XFS_MOUNT_WSYNC is set when the admin mounts the filesystem with either the 'wsync' or 'sync' mount options, which effectively means that we're classifying reflink/dedupe as IO operations and making them synchronous when required.]
Fixes: 3fc9f5e409319 ("xfs: remove xfs_reflink_remap_range") Signed-off-by: Christoph Hellwig hch@lst.de Reviewed-by: Brian Foster bfoster@redhat.com [darrick: add more to the changelog] Reviewed-by: Darrick J. Wong darrick.wong@oracle.com Signed-off-by: Darrick J. Wong darrick.wong@oracle.com Acked-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Chandan Babu R chandan.babu@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/xfs/xfs_file.c | 4 ++++ 1 file changed, 4 insertions(+)
--- a/fs/xfs/xfs_file.c +++ b/fs/xfs/xfs_file.c @@ -1044,7 +1044,11 @@ xfs_file_remap_range(
ret = xfs_reflink_update_dest(dest, pos_out + len, cowextsize, remap_flags); + if (ret) + goto out_unlock;
+ if (mp->m_flags & XFS_MOUNT_WSYNC) + xfs_log_force_inode(dest); out_unlock: xfs_reflink_remap_unlock(file_in, file_out); if (ret)
From: "Darrick J. Wong" darrick.wong@oracle.com
commit f0f7a674d4df1510d8ca050a669e1420cf7d7fab upstream.
[ Modify fs/xfs/xfs_super.c to include the changes at locations suitable for 5.4-lts kernel ]
Move the inode dirty data flushing to a workqueue so that multiple threads can take advantage of a single thread's flushing work. The ratelimiting technique used in bdd4ee4 was not successful, because threads that skipped the inode flush scan due to ratelimiting would ENOSPC early, which caused occasional (but noticeable) changes in behavior and sporadic fstest regressions.
Therefore, make all the writer threads wait on a single inode flush, which eliminates both the stampeding hordes of flushers and the small window in which a write could fail with ENOSPC because it lost the ratelimit race after even another thread freed space.
Fixes: c6425702f21e ("xfs: ratelimit inode flush on buffered write ENOSPC") Signed-off-by: Darrick J. Wong darrick.wong@oracle.com Reviewed-by: Brian Foster bfoster@redhat.com Acked-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Chandan Babu R chandan.babu@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/xfs/xfs_mount.h | 5 +++++ fs/xfs/xfs_super.c | 28 +++++++++++++++++++++++----- 2 files changed, 28 insertions(+), 5 deletions(-)
--- a/fs/xfs/xfs_mount.h +++ b/fs/xfs/xfs_mount.h @@ -179,6 +179,11 @@ typedef struct xfs_mount { struct xfs_error_cfg m_error_cfg[XFS_ERR_CLASS_MAX][XFS_ERR_ERRNO_MAX]; struct xstats m_stats; /* per-fs stats */
+ /* + * Workqueue item so that we can coalesce multiple inode flush attempts + * into a single flush. + */ + struct work_struct m_flush_inodes_work; struct workqueue_struct *m_buf_workqueue; struct workqueue_struct *m_unwritten_workqueue; struct workqueue_struct *m_cil_workqueue; --- a/fs/xfs/xfs_super.c +++ b/fs/xfs/xfs_super.c @@ -840,6 +840,20 @@ xfs_destroy_mount_workqueues( destroy_workqueue(mp->m_buf_workqueue); }
+static void +xfs_flush_inodes_worker( + struct work_struct *work) +{ + struct xfs_mount *mp = container_of(work, struct xfs_mount, + m_flush_inodes_work); + struct super_block *sb = mp->m_super; + + if (down_read_trylock(&sb->s_umount)) { + sync_inodes_sb(sb); + up_read(&sb->s_umount); + } +} + /* * Flush all dirty data to disk. Must not be called while holding an XFS_ILOCK * or a page lock. We use sync_inodes_sb() here to ensure we block while waiting @@ -850,12 +864,15 @@ void xfs_flush_inodes( struct xfs_mount *mp) { - struct super_block *sb = mp->m_super; + /* + * If flush_work() returns true then that means we waited for a flush + * which was already in progress. Don't bother running another scan. + */ + if (flush_work(&mp->m_flush_inodes_work)) + return;
- if (down_read_trylock(&sb->s_umount)) { - sync_inodes_sb(sb); - up_read(&sb->s_umount); - } + queue_work(mp->m_sync_workqueue, &mp->m_flush_inodes_work); + flush_work(&mp->m_flush_inodes_work); }
/* Catch misguided souls that try to use this interface on XFS */ @@ -1532,6 +1549,7 @@ xfs_mount_alloc( spin_lock_init(&mp->m_perag_lock); mutex_init(&mp->m_growlock); atomic_set(&mp->m_active_trans, 0); + INIT_WORK(&mp->m_flush_inodes_work, xfs_flush_inodes_worker); INIT_DELAYED_WORK(&mp->m_reclaim_work, xfs_reclaim_worker); INIT_DELAYED_WORK(&mp->m_eofblocks_work, xfs_eofblocks_worker); INIT_DELAYED_WORK(&mp->m_cowblocks_work, xfs_cowblocks_worker);
From: Dave Chinner dchinner@redhat.com
commit c7f87f3984cfa1e6d32806a715f35c5947ad9c09 upstream.
xlog_wait() on the CIL context can reference a freed context if the waiter doesn't get scheduled before the CIL context is freed. This can happen when a task is on the hard throttle and the CIL push aborts due to a shutdown. This was detected by generic/019:
thread 1 thread 2
__xfs_trans_commit xfs_log_commit_cil <CIL size over hard throttle limit> xlog_wait schedule xlog_cil_push_work wake_up_all <shutdown aborts commit> xlog_cil_committed kmem_free
remove_wait_queue spin_lock_irqsave --> UAF
Fix it by moving the wait queue to the CIL rather than keeping it in in the CIL context that gets freed on push completion. Because the wait queue is now independent of the CIL context and we might have multiple contexts in flight at once, only wake the waiters on the push throttle when the context we are pushing is over the hard throttle size threshold.
Fixes: 0e7ab7efe7745 ("xfs: Throttle commits on delayed background CIL push") Reported-by: Yu Kuai yukuai3@huawei.com Signed-off-by: Dave Chinner dchinner@redhat.com Reviewed-by: Darrick J. Wong darrick.wong@oracle.com Signed-off-by: Darrick J. Wong darrick.wong@oracle.com Reviewed-by: Christoph Hellwig hch@lst.de Acked-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Chandan Babu R chandan.babu@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/xfs/xfs_log_cil.c | 10 +++++----- fs/xfs/xfs_log_priv.h | 2 +- 2 files changed, 6 insertions(+), 6 deletions(-)
--- a/fs/xfs/xfs_log_cil.c +++ b/fs/xfs/xfs_log_cil.c @@ -673,7 +673,8 @@ xlog_cil_push( /* * Wake up any background push waiters now this context is being pushed. */ - wake_up_all(&ctx->push_wait); + if (ctx->space_used >= XLOG_CIL_BLOCKING_SPACE_LIMIT(log)) + wake_up_all(&cil->xc_push_wait);
/* * Check if we've anything to push. If there is nothing, then we don't @@ -745,13 +746,12 @@ xlog_cil_push(
/* * initialise the new context and attach it to the CIL. Then attach - * the current context to the CIL committing lsit so it can be found + * the current context to the CIL committing list so it can be found * during log forces to extract the commit lsn of the sequence that * needs to be forced. */ INIT_LIST_HEAD(&new_ctx->committing); INIT_LIST_HEAD(&new_ctx->busy_extents); - init_waitqueue_head(&new_ctx->push_wait); new_ctx->sequence = ctx->sequence + 1; new_ctx->cil = cil; cil->xc_ctx = new_ctx; @@ -946,7 +946,7 @@ xlog_cil_push_background( if (cil->xc_ctx->space_used >= XLOG_CIL_BLOCKING_SPACE_LIMIT(log)) { trace_xfs_log_cil_wait(log, cil->xc_ctx->ticket); ASSERT(cil->xc_ctx->space_used < log->l_logsize); - xlog_wait(&cil->xc_ctx->push_wait, &cil->xc_push_lock); + xlog_wait(&cil->xc_push_wait, &cil->xc_push_lock); return; }
@@ -1222,12 +1222,12 @@ xlog_cil_init( INIT_LIST_HEAD(&cil->xc_committing); spin_lock_init(&cil->xc_cil_lock); spin_lock_init(&cil->xc_push_lock); + init_waitqueue_head(&cil->xc_push_wait); init_rwsem(&cil->xc_ctx_lock); init_waitqueue_head(&cil->xc_commit_wait);
INIT_LIST_HEAD(&ctx->committing); INIT_LIST_HEAD(&ctx->busy_extents); - init_waitqueue_head(&ctx->push_wait); ctx->sequence = 1; ctx->cil = cil; cil->xc_ctx = ctx; --- a/fs/xfs/xfs_log_priv.h +++ b/fs/xfs/xfs_log_priv.h @@ -247,7 +247,6 @@ struct xfs_cil_ctx { struct xfs_log_vec *lv_chain; /* logvecs being pushed */ struct list_head iclog_entry; struct list_head committing; /* ctx committing list */ - wait_queue_head_t push_wait; /* background push throttle */ struct work_struct discard_endio_work; };
@@ -281,6 +280,7 @@ struct xfs_cil { wait_queue_head_t xc_commit_wait; xfs_lsn_t xc_current_sequence; struct work_struct xc_push_work; + wait_queue_head_t xc_push_wait; /* background push throttle */ } ____cacheline_aligned_in_smp;
/*
From: Joseph Qi joseph.qi@linux.alibaba.com
commit 28f4821b1b53e0649706912e810c6c232fc506f9 upstream.
In ocfs2_mknod(), if error occurs after dinode successfully allocated, ocfs2 i_links_count will not be 0.
So even though we clear inode i_nlink before iput in error handling, it still won't wipe inode since we'll refresh inode from dinode during inode lock. So just like clear inode i_nlink, we clear ocfs2 i_links_count as well. Also do the same change for ocfs2_symlink().
Link: https://lkml.kernel.org/r/20221017130227.234480-2-joseph.qi@linux.alibaba.co... Signed-off-by: Joseph Qi joseph.qi@linux.alibaba.com Reported-by: Yan Wang wangyan122@huawei.com Cc: Mark Fasheh mark@fasheh.com Cc: Joel Becker jlbec@evilplan.org Cc: Junxiao Bi junxiao.bi@oracle.com Cc: Changwei Ge gechangwei@live.cn Cc: Gang He ghe@suse.com Cc: Jun Piao piaojun@huawei.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ocfs2/namei.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-)
--- a/fs/ocfs2/namei.c +++ b/fs/ocfs2/namei.c @@ -231,6 +231,7 @@ static int ocfs2_mknod(struct inode *dir handle_t *handle = NULL; struct ocfs2_super *osb; struct ocfs2_dinode *dirfe; + struct ocfs2_dinode *fe = NULL; struct buffer_head *new_fe_bh = NULL; struct inode *inode = NULL; struct ocfs2_alloc_context *inode_ac = NULL; @@ -381,6 +382,7 @@ static int ocfs2_mknod(struct inode *dir goto leave; }
+ fe = (struct ocfs2_dinode *) new_fe_bh->b_data; if (S_ISDIR(mode)) { status = ocfs2_fill_new_dir(osb, handle, dir, inode, new_fe_bh, data_ac, meta_ac); @@ -446,8 +448,11 @@ static int ocfs2_mknod(struct inode *dir leave: if (status < 0 && did_quota_inode) dquot_free_inode(inode); - if (handle) + if (handle) { + if (status < 0 && fe) + ocfs2_set_links_count(fe, 0); ocfs2_commit_trans(osb, handle); + }
ocfs2_inode_unlock(dir, 1); if (did_block_signals) @@ -2017,8 +2022,11 @@ bail: ocfs2_clusters_to_bytes(osb->sb, 1)); if (status < 0 && did_quota_inode) dquot_free_inode(inode); - if (handle) + if (handle) { + if (status < 0 && fe) + ocfs2_set_links_count(fe, 0); ocfs2_commit_trans(osb, handle); + }
ocfs2_inode_unlock(dir, 1); if (did_block_signals)
From: Joseph Qi joseph.qi@linux.alibaba.com
commit 759a7c6126eef5635506453e9b9d55a6a3ac2084 upstream.
Commit b1529a41f777 "ocfs2: should reclaim the inode if '__ocfs2_mknod_locked' returns an error" tried to reclaim the claimed inode if __ocfs2_mknod_locked() fails later. But this introduce a race, the freed bit may be reused immediately by another thread, which will update dinode, e.g. i_generation. Then iput this inode will lead to BUG: inode->i_generation != le32_to_cpu(fe->i_generation)
We could make this inode as bad, but we did want to do operations like wipe in some cases. Since the claimed inode bit can only affect that an dinode is missing and will return back after fsck, it seems not a big problem. So just leave it as is by revert the reclaim logic.
Link: https://lkml.kernel.org/r/20221017130227.234480-1-joseph.qi@linux.alibaba.co... Fixes: b1529a41f777 ("ocfs2: should reclaim the inode if '__ocfs2_mknod_locked' returns an error") Signed-off-by: Joseph Qi joseph.qi@linux.alibaba.com Reported-by: Yan Wang wangyan122@huawei.com Cc: Mark Fasheh mark@fasheh.com Cc: Joel Becker jlbec@evilplan.org Cc: Junxiao Bi junxiao.bi@oracle.com Cc: Changwei Ge gechangwei@live.cn Cc: Gang He ghe@suse.com Cc: Jun Piao piaojun@huawei.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ocfs2/namei.c | 11 +---------- 1 file changed, 1 insertion(+), 10 deletions(-)
--- a/fs/ocfs2/namei.c +++ b/fs/ocfs2/namei.c @@ -630,18 +630,9 @@ static int ocfs2_mknod_locked(struct ocf return status; }
- status = __ocfs2_mknod_locked(dir, inode, dev, new_fe_bh, + return __ocfs2_mknod_locked(dir, inode, dev, new_fe_bh, parent_fe_bh, handle, inode_ac, fe_blkno, suballoc_loc, suballoc_bit); - if (status < 0) { - u64 bg_blkno = ocfs2_which_suballoc_group(fe_blkno, suballoc_bit); - int tmp = ocfs2_free_suballoc_bits(handle, inode_ac->ac_inode, - inode_ac->ac_bh, suballoc_bit, bg_blkno, 1); - if (tmp) - mlog_errno(tmp); - } - - return status; }
static int ocfs2_mkdir(struct inode *dir,
From: Borislav Petkov bp@suse.de
commit e7ad18d1169c62e6c78c01ff693fd362d9d65278 upstream.
Currently, the patch application logic checks whether the revision needs to be applied on each logical CPU (SMT thread). Therefore, on SMT designs where the microcode engine is shared between the two threads, the application happens only on one of them as that is enough to update the shared microcode engine.
However, there are microcode patches which do per-thread modification, see Link tag below.
Therefore, drop the revision check and try applying on each thread. This is what the BIOS does too so this method is very much tested.
Btw, change only the early paths. On the late loading paths, there's no point in doing per-thread modification because if is it some case like in the bugzilla below - removing a CPUID flag - the kernel cannot go and un-use features it has detected are there early. For that, one should use early loading anyway.
[ bp: Fixes does not contain the oldest commit which did check for equality but that is good enough. ]
Fixes: 8801b3fcb574 ("x86/microcode/AMD: Rework container parsing") Reported-by: Ștefan Talpalaru stefantalpalaru@yahoo.com Signed-off-by: Borislav Petkov bp@suse.de Tested-by: Ștefan Talpalaru stefantalpalaru@yahoo.com Cc: stable@vger.kernel.org Link: https://bugzilla.kernel.org/show_bug.cgi?id=216211 Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kernel/cpu/microcode/amd.c | 16 +++++++++++++--- 1 file changed, 13 insertions(+), 3 deletions(-)
--- a/arch/x86/kernel/cpu/microcode/amd.c +++ b/arch/x86/kernel/cpu/microcode/amd.c @@ -441,7 +441,13 @@ apply_microcode_early_amd(u32 cpuid_1_ea return ret;
native_rdmsr(MSR_AMD64_PATCH_LEVEL, rev, dummy); - if (rev >= mc->hdr.patch_id) + + /* + * Allow application of the same revision to pick up SMT-specific + * changes even if the revision of the other SMT thread is already + * up-to-date. + */ + if (rev > mc->hdr.patch_id) return ret;
if (!__apply_microcode_amd(mc)) { @@ -523,8 +529,12 @@ void load_ucode_amd_ap(unsigned int cpui
native_rdmsr(MSR_AMD64_PATCH_LEVEL, rev, dummy);
- /* Check whether we have saved a new patch already: */ - if (*new_rev && rev < mc->hdr.patch_id) { + /* + * Check whether a new patch has been saved already. Also, allow application of + * the same revision in order to pick up SMT-thread-specific configuration even + * if the sibling SMT thread already has an up-to-date revision. + */ + if (*new_rev && rev <= mc->hdr.patch_id) { if (!__apply_microcode_amd(mc)) { *new_rev = mc->hdr.patch_id; return;
From: Zhang Rui rui.zhang@intel.com
commit 7108b80a542b9d65e44b36d64a700a83658c0b73 upstream.
The coretemp driver supports up to a hard-coded limit of 128 cores.
Today, the driver can not support a core with an ID above that limit. Yet, the encoding of core ID's is arbitrary (BIOS APIC-ID) and so they may be sparse and they may be large.
Update the driver to map arbitrary core ID numbers into appropriate array indexes so that 128 cores can be supported, no matter the encoding of core ID's.
Signed-off-by: Zhang Rui rui.zhang@intel.com Signed-off-by: Dave Hansen dave.hansen@linux.intel.com Acked-by: Len Brown len.brown@intel.com Acked-by: Guenter Roeck linux@roeck-us.net Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20221014090147.1836-3-rui.zhang@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/hwmon/coretemp.c | 56 ++++++++++++++++++++++++++++++++++------------- 1 file changed, 41 insertions(+), 15 deletions(-)
--- a/drivers/hwmon/coretemp.c +++ b/drivers/hwmon/coretemp.c @@ -46,9 +46,6 @@ MODULE_PARM_DESC(tjmax, "TjMax value in #define TOTAL_ATTRS (MAX_CORE_ATTRS + 1) #define MAX_CORE_DATA (NUM_REAL_CORES + BASE_SYSFS_ATTR_NO)
-#define TO_CORE_ID(cpu) (cpu_data(cpu).cpu_core_id) -#define TO_ATTR_NO(cpu) (TO_CORE_ID(cpu) + BASE_SYSFS_ATTR_NO) - #ifdef CONFIG_SMP #define for_each_sibling(i, cpu) \ for_each_cpu(i, topology_sibling_cpumask(cpu)) @@ -91,6 +88,8 @@ struct temp_data { struct platform_data { struct device *hwmon_dev; u16 pkg_id; + u16 cpu_map[NUM_REAL_CORES]; + struct ida ida; struct cpumask cpumask; struct temp_data *core_data[MAX_CORE_DATA]; struct device_attribute name_attr; @@ -441,7 +440,7 @@ static struct temp_data *init_temp_data( MSR_IA32_THERM_STATUS; tdata->is_pkg_data = pkg_flag; tdata->cpu = cpu; - tdata->cpu_core_id = TO_CORE_ID(cpu); + tdata->cpu_core_id = topology_core_id(cpu); tdata->attr_size = MAX_CORE_ATTRS; mutex_init(&tdata->update_lock); return tdata; @@ -454,7 +453,7 @@ static int create_core_data(struct platf struct platform_data *pdata = platform_get_drvdata(pdev); struct cpuinfo_x86 *c = &cpu_data(cpu); u32 eax, edx; - int err, attr_no; + int err, index, attr_no;
/* * Find attr number for sysfs: @@ -462,14 +461,26 @@ static int create_core_data(struct platf * The attr number is always core id + 2 * The Pkgtemp will always show up as temp1_*, if available */ - attr_no = pkg_flag ? PKG_SYSFS_ATTR_NO : TO_ATTR_NO(cpu); + if (pkg_flag) { + attr_no = PKG_SYSFS_ATTR_NO; + } else { + index = ida_alloc(&pdata->ida, GFP_KERNEL); + if (index < 0) + return index; + pdata->cpu_map[index] = topology_core_id(cpu); + attr_no = index + BASE_SYSFS_ATTR_NO; + }
- if (attr_no > MAX_CORE_DATA - 1) - return -ERANGE; + if (attr_no > MAX_CORE_DATA - 1) { + err = -ERANGE; + goto ida_free; + }
tdata = init_temp_data(cpu, pkg_flag); - if (!tdata) - return -ENOMEM; + if (!tdata) { + err = -ENOMEM; + goto ida_free; + }
/* Test if we can access the status register */ err = rdmsr_safe_on_cpu(cpu, tdata->status_reg, &eax, &edx); @@ -505,6 +516,9 @@ static int create_core_data(struct platf exit_free: pdata->core_data[attr_no] = NULL; kfree(tdata); +ida_free: + if (!pkg_flag) + ida_free(&pdata->ida, index); return err; }
@@ -524,6 +538,9 @@ static void coretemp_remove_core(struct
kfree(pdata->core_data[indx]); pdata->core_data[indx] = NULL; + + if (indx >= BASE_SYSFS_ATTR_NO) + ida_free(&pdata->ida, indx - BASE_SYSFS_ATTR_NO); }
static int coretemp_probe(struct platform_device *pdev) @@ -537,6 +554,7 @@ static int coretemp_probe(struct platfor return -ENOMEM;
pdata->pkg_id = pdev->id; + ida_init(&pdata->ida); platform_set_drvdata(pdev, pdata);
pdata->hwmon_dev = devm_hwmon_device_register_with_groups(dev, DRVNAME, @@ -553,6 +571,7 @@ static int coretemp_remove(struct platfo if (pdata->core_data[i]) coretemp_remove_core(pdata, i);
+ ida_destroy(&pdata->ida); return 0; }
@@ -647,7 +666,7 @@ static int coretemp_cpu_offline(unsigned struct platform_device *pdev = coretemp_get_pdev(cpu); struct platform_data *pd; struct temp_data *tdata; - int indx, target; + int i, indx = -1, target;
/* * Don't execute this on suspend as the device remove locks @@ -660,12 +679,19 @@ static int coretemp_cpu_offline(unsigned if (!pdev) return 0;
- /* The core id is too big, just return */ - indx = TO_ATTR_NO(cpu); - if (indx > MAX_CORE_DATA - 1) + pd = platform_get_drvdata(pdev); + + for (i = 0; i < NUM_REAL_CORES; i++) { + if (pd->cpu_map[i] == topology_core_id(cpu)) { + indx = i + BASE_SYSFS_ATTR_NO; + break; + } + } + + /* Too many cores and this core is not populated, just return */ + if (indx < 0) return 0;
- pd = platform_get_drvdata(pdev); tdata = pd->core_data[indx];
cpumask_clear_cpu(cpu, &pd->cpumask);
From: Alexander Stein alexander.stein@ew.tq-group.com
commit 979556f1521a835a059de3b117b9c6c6642c7d58 upstream.
'ahci:' is an invalid prefix, preventing the module from autoloading. Fix this by using the 'platform:' prefix and DRV_NAME.
Fixes: 9e54eae23bc9 ("ahci_imx: add ahci sata support on imx platforms") Cc: stable@vger.kernel.org Signed-off-by: Alexander Stein alexander.stein@ew.tq-group.com Reviewed-by: Fabio Estevam festevam@gmail.com Signed-off-by: Damien Le Moal damien.lemoal@opensource.wdc.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/ata/ahci_imx.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/ata/ahci_imx.c +++ b/drivers/ata/ahci_imx.c @@ -1239,4 +1239,4 @@ module_platform_driver(imx_ahci_driver); MODULE_DESCRIPTION("Freescale i.MX AHCI SATA platform driver"); MODULE_AUTHOR("Richard Zhu Hong-Xing.Zhu@freescale.com"); MODULE_LICENSE("GPL"); -MODULE_ALIAS("ahci:imx"); +MODULE_ALIAS("platform:" DRV_NAME);
From: Kai-Heng Feng kai.heng.feng@canonical.com
commit 1e41e693f458eef2d5728207dbd327cd3b16580a upstream.
UBSAN complains about array-index-out-of-bounds: [ 1.980703] kernel: UBSAN: array-index-out-of-bounds in /build/linux-9H675w/linux-5.15.0/drivers/ata/libahci.c:968:41 [ 1.980709] kernel: index 15 is out of range for type 'ahci_em_priv [8]' [ 1.980713] kernel: CPU: 0 PID: 209 Comm: scsi_eh_8 Not tainted 5.15.0-25-generic #25-Ubuntu [ 1.980716] kernel: Hardware name: System manufacturer System Product Name/P5Q3, BIOS 1102 06/11/2010 [ 1.980718] kernel: Call Trace: [ 1.980721] kernel: <TASK> [ 1.980723] kernel: show_stack+0x52/0x58 [ 1.980729] kernel: dump_stack_lvl+0x4a/0x5f [ 1.980734] kernel: dump_stack+0x10/0x12 [ 1.980736] kernel: ubsan_epilogue+0x9/0x45 [ 1.980739] kernel: __ubsan_handle_out_of_bounds.cold+0x44/0x49 [ 1.980742] kernel: ahci_qc_issue+0x166/0x170 [libahci] [ 1.980748] kernel: ata_qc_issue+0x135/0x240 [ 1.980752] kernel: ata_exec_internal_sg+0x2c4/0x580 [ 1.980754] kernel: ? vprintk_default+0x1d/0x20 [ 1.980759] kernel: ata_exec_internal+0x67/0xa0 [ 1.980762] kernel: sata_pmp_read+0x8d/0xc0 [ 1.980765] kernel: sata_pmp_read_gscr+0x3c/0x90 [ 1.980768] kernel: sata_pmp_attach+0x8b/0x310 [ 1.980771] kernel: ata_eh_revalidate_and_attach+0x28c/0x4b0 [ 1.980775] kernel: ata_eh_recover+0x6b6/0xb30 [ 1.980778] kernel: ? ahci_do_hardreset+0x180/0x180 [libahci] [ 1.980783] kernel: ? ahci_stop_engine+0xb0/0xb0 [libahci] [ 1.980787] kernel: ? ahci_do_softreset+0x290/0x290 [libahci] [ 1.980792] kernel: ? trace_event_raw_event_ata_eh_link_autopsy_qc+0xe0/0xe0 [ 1.980795] kernel: sata_pmp_eh_recover.isra.0+0x214/0x560 [ 1.980799] kernel: sata_pmp_error_handler+0x23/0x40 [ 1.980802] kernel: ahci_error_handler+0x43/0x80 [libahci] [ 1.980806] kernel: ata_scsi_port_error_handler+0x2b1/0x600 [ 1.980810] kernel: ata_scsi_error+0x9c/0xd0 [ 1.980813] kernel: scsi_error_handler+0xa1/0x180 [ 1.980817] kernel: ? scsi_unjam_host+0x1c0/0x1c0 [ 1.980820] kernel: kthread+0x12a/0x150 [ 1.980823] kernel: ? set_kthread_struct+0x50/0x50 [ 1.980826] kernel: ret_from_fork+0x22/0x30 [ 1.980831] kernel: </TASK>
This happens because sata_pmp_init_links() initialize link->pmp up to SATA_PMP_MAX_PORTS while em_priv is declared as 8 elements array.
I can't find the maximum Enclosure Management ports specified in AHCI spec v1.3.1, but "12.2.1 LED message type" states that "Port Multiplier Information" can utilize 4 bits, which implies it can support up to 16 ports. Hence, use SATA_PMP_MAX_PORTS as EM_MAX_SLOTS to resolve the issue.
BugLink: https://bugs.launchpad.net/bugs/1970074 Cc: stable@vger.kernel.org Signed-off-by: Kai-Heng Feng kai.heng.feng@canonical.com Signed-off-by: Damien Le Moal damien.lemoal@opensource.wdc.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/ata/ahci.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/ata/ahci.h +++ b/drivers/ata/ahci.h @@ -254,7 +254,7 @@ enum { PCS_7 = 0x94, /* 7+ port PCS (Denverton) */
/* em constants */ - EM_MAX_SLOTS = 8, + EM_MAX_SLOTS = SATA_PMP_MAX_PORTS, EM_MAX_RETRY = 5,
/* em_ctl bits */
From: Eric Ren renzhengeek@gmail.com
commit c000a2607145d28b06c697f968491372ea56c23a upstream.
With some PCIe topologies, restoring a guest fails while parsing the ITS device tables.
Reproducer hints: 1. Create ARM virt VM with pxb-pcie bus which adds extra host bridges, with qemu command like:
``` -device pxb-pcie,bus_nr=8,id=pci.x,numa_node=0,bus=pcie.0 \ -device pcie-root-port,..,bus=pci.x \ ... -device pxb-pcie,bus_nr=37,id=pci.y,numa_node=1,bus=pcie.0 \ -device pcie-root-port,..,bus=pci.y \ ...
``` 2. Ensure the guest uses 2-level device table 3. Perform VM migration which calls save/restore device tables
In that setup, we get a big "offset" between 2 device_ids, which makes unsigned "len" round up a big positive number, causing the scan loop to continue with a bad GPA. For example:
1. L1 table has 2 entries; 2. and we are now scanning at L2 table entry index 2075 (pointed to by L1 first entry) 3. if next device id is 9472, we will get a big offset: 7397; 4. with unsigned 'len', 'len -= offset * esz', len will underflow to a positive number, mistakenly into next iteration with a bad GPA; (It should break out of the current L2 table scanning, and jump into the next L1 table entry) 5. that bad GPA fails the guest read.
Fix it by stopping the L2 table scan when the next device id is outside of the current table, allowing the scan to continue from the next L1 table entry.
Thanks to Eric Auger for the fix suggestion.
Fixes: 920a7a8fa92a ("KVM: arm64: vgic-its: Add infrastructure for tableookup") Suggested-by: Eric Auger eric.auger@redhat.com Signed-off-by: Eric Ren renzhengeek@gmail.com [maz: commit message tidy-up] Signed-off-by: Marc Zyngier maz@kernel.org Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/d9c3a564af9e2c5bf63f48a7dcbf08cd593c5c0b.166580298... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- virt/kvm/arm/vgic/vgic-its.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
--- a/virt/kvm/arm/vgic/vgic-its.c +++ b/virt/kvm/arm/vgic/vgic-its.c @@ -2095,7 +2095,7 @@ static int scan_its_table(struct vgic_it
memset(entry, 0, esz);
- while (len > 0) { + while (true) { int next_offset; size_t byte_offset;
@@ -2108,6 +2108,9 @@ static int scan_its_table(struct vgic_it return next_offset;
byte_offset = next_offset * esz; + if (byte_offset >= len) + break; + id += next_offset; gpa += byte_offset; len -= byte_offset;
From: Bryan O'Donoghue bryan.odonoghue@linaro.org
commit 06a2da340f762addc5935bf851d95b14d4692db2 upstream.
Debugging the decoder on msm8916 I noticed the vdec probe was crashing if the fmt pointer was NULL.
A similar fix from Colin Ian King found by Coverity was implemented for the encoder. Implement the same fix on the decoder.
Fixes: 7472c1c69138 ("[media] media: venus: vdec: add video decoder files") Cc: stable@vger.kernel.org # v4.13+ Signed-off-by: Bryan O'Donoghue bryan.odonoghue@linaro.org Signed-off-by: Stanimir Varbanov stanimir.varbanov@linaro.org Signed-off-by: Mauro Carvalho Chehab mchehab@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/media/platform/qcom/venus/vdec.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/drivers/media/platform/qcom/venus/vdec.c +++ b/drivers/media/platform/qcom/venus/vdec.c @@ -157,6 +157,8 @@ vdec_try_fmt_common(struct venus_inst *i else return NULL; fmt = find_format(inst, pixmp->pixelformat, f->type); + if (!fmt) + return NULL; }
pixmp->width = clamp(pixmp->width, frame_width_min(inst),
From: James Morse james.morse@arm.com
commit 44b3834b2eed595af07021b1c64e6f9bc396398b upstream.
Cortex-A57 and Cortex-A72 have an erratum where an interrupt that occurs between a pair of AES instructions in aarch32 mode may corrupt the ELR. The task will subsequently produce the wrong AES result.
The AES instructions are part of the cryptographic extensions, which are optional. User-space software will detect the support for these instructions from the hwcaps. If the platform doesn't support these instructions a software implementation should be used.
Remove the hwcap bits on affected parts to indicate user-space should not use the AES instructions.
Acked-by: Ard Biesheuvel ardb@kernel.org Signed-off-by: James Morse james.morse@arm.com Link: https://lore.kernel.org/r/20220714161523.279570-3-james.morse@arm.com Signed-off-by: Will Deacon will@kernel.org [florian: resolved conflicts in arch/arm64/tools/cpucaps and cpu_errata.c] Signed-off-by: Florian Fainelli f.fainelli@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- Documentation/arm64/silicon-errata.rst | 4 ++++ arch/arm64/Kconfig | 16 ++++++++++++++++ arch/arm64/include/asm/cpucaps.h | 3 ++- arch/arm64/kernel/cpu_errata.c | 16 ++++++++++++++++ arch/arm64/kernel/cpufeature.c | 13 ++++++++++++- 5 files changed, 50 insertions(+), 2 deletions(-)
--- a/Documentation/arm64/silicon-errata.rst +++ b/Documentation/arm64/silicon-errata.rst @@ -70,8 +70,12 @@ stable kernels. +----------------+-----------------+-----------------+-----------------------------+ | ARM | Cortex-A57 | #834220 | ARM64_ERRATUM_834220 | +----------------+-----------------+-----------------+-----------------------------+ +| ARM | Cortex-A57 | #1742098 | ARM64_ERRATUM_1742098 | ++----------------+-----------------+-----------------+-----------------------------+ | ARM | Cortex-A72 | #853709 | N/A | +----------------+-----------------+-----------------+-----------------------------+ +| ARM | Cortex-A72 | #1655431 | ARM64_ERRATUM_1742098 | ++----------------+-----------------+-----------------+-----------------------------+ | ARM | Cortex-A73 | #858921 | ARM64_ERRATUM_858921 | +----------------+-----------------+-----------------+-----------------------------+ | ARM | Cortex-A55 | #1024718 | ARM64_ERRATUM_1024718 | --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -574,6 +574,22 @@ config ARM64_ERRATUM_1542419
If unsure, say Y.
+config ARM64_ERRATUM_1742098 + bool "Cortex-A57/A72: 1742098: ELR recorded incorrectly on interrupt taken between cryptographic instructions in a sequence" + depends on COMPAT + default y + help + This option removes the AES hwcap for aarch32 user-space to + workaround erratum 1742098 on Cortex-A57 and Cortex-A72. + + Affected parts may corrupt the AES state if an interrupt is + taken between a pair of AES instructions. These instructions + are only present if the cryptography extensions are present. + All software should have a fallback implementation for CPUs + that don't implement the cryptography extensions. + + If unsure, say Y. + config CAVIUM_ERRATUM_22375 bool "Cavium erratum 22375, 24313" default y --- a/arch/arm64/include/asm/cpucaps.h +++ b/arch/arm64/include/asm/cpucaps.h @@ -56,7 +56,8 @@ #define ARM64_WORKAROUND_CAVIUM_TX2_219_PRFM 46 #define ARM64_WORKAROUND_1542419 47 #define ARM64_SPECTRE_BHB 48 +#define ARM64_WORKAROUND_1742098 49
-#define ARM64_NCAPS 49 +#define ARM64_NCAPS 50
#endif /* __ASM_CPUCAPS_H */ --- a/arch/arm64/kernel/cpu_errata.c +++ b/arch/arm64/kernel/cpu_errata.c @@ -817,6 +817,14 @@ static const struct arm64_cpu_capabiliti }; #endif
+#ifdef CONFIG_ARM64_ERRATUM_1742098 +static struct midr_range broken_aarch32_aes[] = { + MIDR_RANGE(MIDR_CORTEX_A57, 0, 1, 0xf, 0xf), + MIDR_ALL_VERSIONS(MIDR_CORTEX_A72), + {}, +}; +#endif + const struct arm64_cpu_capabilities arm64_errata[] = { #ifdef CONFIG_ARM64_WORKAROUND_CLEAN_CACHE { @@ -998,6 +1006,14 @@ const struct arm64_cpu_capabilities arm6 .cpu_enable = cpu_enable_trap_ctr_access, }, #endif +#ifdef CONFIG_ARM64_ERRATUM_1742098 + { + .desc = "ARM erratum 1742098", + .capability = ARM64_WORKAROUND_1742098, + CAP_MIDR_RANGE_LIST(broken_aarch32_aes), + .type = ARM64_CPUCAP_LOCAL_CPU_ERRATUM, + }, +#endif { } }; --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -21,6 +21,7 @@ #include <asm/cpufeature.h> #include <asm/cpu_ops.h> #include <asm/fpsimd.h> +#include <asm/hwcap.h> #include <asm/mmu_context.h> #include <asm/processor.h> #include <asm/sysreg.h> @@ -1280,6 +1281,14 @@ static bool can_use_gic_priorities(const } #endif
+static void elf_hwcap_fixup(void) +{ +#ifdef CONFIG_ARM64_ERRATUM_1742098 + if (cpus_have_const_cap(ARM64_WORKAROUND_1742098)) + compat_elf_hwcap2 &= ~COMPAT_HWCAP2_AES; +#endif /* ARM64_ERRATUM_1742098 */ +} + static const struct arm64_cpu_capabilities arm64_features[] = { { .desc = "GIC system register CPU interface", @@ -2103,8 +2112,10 @@ void __init setup_cpu_features(void) mark_const_caps_ready(); setup_elf_hwcaps(arm64_elf_hwcaps);
- if (system_supports_32bit_el0()) + if (system_supports_32bit_el0()) { setup_elf_hwcaps(compat_elf_hwcaps); + elf_hwcap_fixup(); + }
if (system_uses_ttbr0_pan()) pr_info("emulated: Privileged Access Never (PAN) using TTBR0_EL1 switching\n");
From: Jean-Francois Le Fillatre jflf_kernel@gmx.com
commit 1bd3a383075c64d638e65d263c9267b08ee7733c upstream.
The Lenovo OneLink+ Dock contains an RTL8153 controller that behaves as a broken CDC device by default. Add the custom Lenovo PID to the r8152 driver to support it properly.
Also, systems compatible with this dock provide a BIOS option to enable MAC address passthrough (as per Lenovo document "ThinkPad Docking Solutions 2017"). Add the custom PID to the MAC passthrough list too.
Tested on a ThinkPad 13 1st gen with the expected results:
passthrough disabled: Invalid header when reading pass-thru MAC addr passthrough enabled: Using pass-thru MAC addr XX:XX:XX:XX:XX:XX
Signed-off-by: Jean-Francois Le Fillatre jflf_kernel@gmx.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/usb/cdc_ether.c | 7 +++++++ drivers/net/usb/r8152.c | 1 + 2 files changed, 8 insertions(+)
--- a/drivers/net/usb/cdc_ether.c +++ b/drivers/net/usb/cdc_ether.c @@ -764,6 +764,13 @@ static const struct usb_device_id produc }, #endif
+/* Lenovo ThinkPad OneLink+ Dock (based on Realtek RTL8153) */ +{ + USB_DEVICE_AND_INTERFACE_INFO(LENOVO_VENDOR_ID, 0x3054, USB_CLASS_COMM, + USB_CDC_SUBCLASS_ETHERNET, USB_CDC_PROTO_NONE), + .driver_info = 0, +}, + /* ThinkPad USB-C Dock (based on Realtek RTL8153) */ { USB_DEVICE_AND_INTERFACE_INFO(LENOVO_VENDOR_ID, 0x3062, USB_CLASS_COMM, --- a/drivers/net/usb/r8152.c +++ b/drivers/net/usb/r8152.c @@ -5823,6 +5823,7 @@ static const struct usb_device_id rtl815 {REALTEK_USB_DEVICE(VENDOR_ID_MICROSOFT, 0x0927)}, {REALTEK_USB_DEVICE(VENDOR_ID_SAMSUNG, 0xa101)}, {REALTEK_USB_DEVICE(VENDOR_ID_LENOVO, 0x304f)}, + {REALTEK_USB_DEVICE(VENDOR_ID_LENOVO, 0x3054)}, {REALTEK_USB_DEVICE(VENDOR_ID_LENOVO, 0x3062)}, {REALTEK_USB_DEVICE(VENDOR_ID_LENOVO, 0x3069)}, {REALTEK_USB_DEVICE(VENDOR_ID_LENOVO, 0x7205)},
From: Filipe Manana fdmanana@suse.com
[ Upstream commit 4fc7b57228243d09c0d878873bf24fa64a90fa01 ]
When processing delayed data references during backref walking and we are using a share context (we are being called through fiemap), whenever we find a delayed data reference for an inode different from the one we are interested in, then we immediately exit and consider the data extent as shared. This is wrong, because:
1) This might be a DROP reference that will cancel out a reference in the extent tree;
2) Even if it's an ADD reference, it may be followed by a DROP reference that cancels it out.
In either case we should not exit immediately.
Fix this by never exiting when we find a delayed data reference for another inode - instead add the reference and if it does not cancel out other delayed reference, we will exit early when we call extent_is_shared() after processing all delayed references. If we find a drop reference, then signal the code that processes references from the extent tree (add_inline_refs() and add_keyed_refs()) to not exit immediately if it finds there a reference for another inode, since we have delayed drop references that may cancel it out. In this later case we exit once we don't have references in the rb trees that cancel out each other and have two references for different inodes.
Example reproducer for case 1):
$ cat test-1.sh #!/bin/bash
DEV=/dev/sdj MNT=/mnt/sdj
mkfs.btrfs -f $DEV mount $DEV $MNT
xfs_io -f -c "pwrite 0 64K" $MNT/foo cp --reflink=always $MNT/foo $MNT/bar
echo echo "fiemap after cloning:" xfs_io -c "fiemap -v" $MNT/foo
rm -f $MNT/bar echo echo "fiemap after removing file bar:" xfs_io -c "fiemap -v" $MNT/foo
umount $MNT
Running it before this patch, the extent is still listed as shared, it has the flag 0x2000 (FIEMAP_EXTENT_SHARED) set:
$ ./test-1.sh fiemap after cloning: /mnt/sdj/foo: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..127]: 26624..26751 128 0x2001
fiemap after removing file bar: /mnt/sdj/foo: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..127]: 26624..26751 128 0x2001
Example reproducer for case 2):
$ cat test-2.sh #!/bin/bash
DEV=/dev/sdj MNT=/mnt/sdj
mkfs.btrfs -f $DEV mount $DEV $MNT
xfs_io -f -c "pwrite 0 64K" $MNT/foo cp --reflink=always $MNT/foo $MNT/bar
# Flush delayed references to the extent tree and commit current # transaction. sync
echo echo "fiemap after cloning:" xfs_io -c "fiemap -v" $MNT/foo
rm -f $MNT/bar echo echo "fiemap after removing file bar:" xfs_io -c "fiemap -v" $MNT/foo
umount $MNT
Running it before this patch, the extent is still listed as shared, it has the flag 0x2000 (FIEMAP_EXTENT_SHARED) set:
$ ./test-2.sh fiemap after cloning: /mnt/sdj/foo: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..127]: 26624..26751 128 0x2001
fiemap after removing file bar: /mnt/sdj/foo: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..127]: 26624..26751 128 0x2001
After this patch, after deleting bar in both tests, the extent is not reported with the 0x2000 flag anymore, it gets only the flag 0x1 (which is FIEMAP_EXTENT_LAST):
$ ./test-1.sh fiemap after cloning: /mnt/sdj/foo: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..127]: 26624..26751 128 0x2001
fiemap after removing file bar: /mnt/sdj/foo: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..127]: 26624..26751 128 0x1
$ ./test-2.sh fiemap after cloning: /mnt/sdj/foo: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..127]: 26624..26751 128 0x2001
fiemap after removing file bar: /mnt/sdj/foo: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..127]: 26624..26751 128 0x1
These tests will later be converted to a test case for fstests.
Fixes: dc046b10c8b7d4 ("Btrfs: make fiemap not blow when you have lots of snapshots") Signed-off-by: Filipe Manana fdmanana@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/btrfs/backref.c | 33 ++++++++++++++++++++++++--------- 1 file changed, 24 insertions(+), 9 deletions(-)
diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c index c701a19fac53..9c969b90aec4 100644 --- a/fs/btrfs/backref.c +++ b/fs/btrfs/backref.c @@ -136,6 +136,7 @@ struct share_check { u64 root_objectid; u64 inum; int share_count; + bool have_delayed_delete_refs; };
static inline int extent_is_shared(struct share_check *sc) @@ -876,13 +877,22 @@ static int add_delayed_refs(const struct btrfs_fs_info *fs_info, key.offset = ref->offset;
/* - * Found a inum that doesn't match our known inum, we - * know it's shared. + * If we have a share check context and a reference for + * another inode, we can't exit immediately. This is + * because even if this is a BTRFS_ADD_DELAYED_REF + * reference we may find next a BTRFS_DROP_DELAYED_REF + * which cancels out this ADD reference. + * + * If this is a DROP reference and there was no previous + * ADD reference, then we need to signal that when we + * process references from the extent tree (through + * add_inline_refs() and add_keyed_refs()), we should + * not exit early if we find a reference for another + * inode, because one of the delayed DROP references + * may cancel that reference in the extent tree. */ - if (sc && sc->inum && ref->objectid != sc->inum) { - ret = BACKREF_FOUND_SHARED; - goto out; - } + if (sc && count < 0) + sc->have_delayed_delete_refs = true;
ret = add_indirect_ref(fs_info, preftrees, ref->root, &key, 0, node->bytenr, count, sc, @@ -912,7 +922,7 @@ static int add_delayed_refs(const struct btrfs_fs_info *fs_info, } if (!ret) ret = extent_is_shared(sc); -out: + spin_unlock(&head->lock); return ret; } @@ -1015,7 +1025,8 @@ static int add_inline_refs(const struct btrfs_fs_info *fs_info, key.type = BTRFS_EXTENT_DATA_KEY; key.offset = btrfs_extent_data_ref_offset(leaf, dref);
- if (sc && sc->inum && key.objectid != sc->inum) { + if (sc && sc->inum && key.objectid != sc->inum && + !sc->have_delayed_delete_refs) { ret = BACKREF_FOUND_SHARED; break; } @@ -1025,6 +1036,7 @@ static int add_inline_refs(const struct btrfs_fs_info *fs_info, ret = add_indirect_ref(fs_info, preftrees, root, &key, 0, bytenr, count, sc, GFP_NOFS); + break; } default: @@ -1114,7 +1126,8 @@ static int add_keyed_refs(struct btrfs_fs_info *fs_info, key.type = BTRFS_EXTENT_DATA_KEY; key.offset = btrfs_extent_data_ref_offset(leaf, dref);
- if (sc && sc->inum && key.objectid != sc->inum) { + if (sc && sc->inum && key.objectid != sc->inum && + !sc->have_delayed_delete_refs) { ret = BACKREF_FOUND_SHARED; break; } @@ -1537,6 +1550,7 @@ int btrfs_check_shared(struct btrfs_root *root, u64 inum, u64 bytenr, .root_objectid = root->root_key.objectid, .inum = inum, .share_count = 0, + .have_delayed_delete_refs = false, };
ulist_init(roots); @@ -1571,6 +1585,7 @@ int btrfs_check_shared(struct btrfs_root *root, u64 inum, u64 bytenr, break; bytenr = node->val; shared.share_count = 0; + shared.have_delayed_delete_refs = false; cond_resched(); }
From: Filipe Manana fdmanana@suse.com
[ Upstream commit 943553ef9b51db303ab2b955c1025261abfdf6fb ]
During backref walking, when processing a delayed reference with a type of BTRFS_TREE_BLOCK_REF_KEY, we have two bugs there:
1) We are accessing the delayed references extent_op, and its key, without the protection of the delayed ref head's lock;
2) If there's no extent op for the delayed ref head, we end up with an uninitialized key in the stack, variable 'tmp_op_key', and then pass it to add_indirect_ref(), which adds the reference to the indirect refs rb tree.
This is wrong, because indirect references should have a NULL key when we don't have access to the key, and in that case they should be added to the indirect_missing_keys rb tree and not to the indirect rb tree.
This means that if have BTRFS_TREE_BLOCK_REF_KEY delayed ref resulting from freeing an extent buffer, therefore with a count of -1, it will not cancel out the corresponding reference we have in the extent tree (with a count of 1), since both references end up in different rb trees.
When using fiemap, where we often need to check if extents are shared through shared subtrees resulting from snapshots, it means we can incorrectly report an extent as shared when it's no longer shared. However this is temporary because after the transaction is committed the extent is no longer reported as shared, as running the delayed reference results in deleting the tree block reference from the extent tree.
Outside the fiemap context, the result is unpredictable, as the key was not initialized but it's used when navigating the rb trees to insert and search for references (prelim_ref_compare()), and we expect all references in the indirect rb tree to have valid keys.
The following reproducer triggers the second bug:
$ cat test.sh #!/bin/bash
DEV=/dev/sdj MNT=/mnt/sdj
mkfs.btrfs -f $DEV mount -o compress $DEV $MNT
# With a compressed 128M file we get a tree height of 2 (level 1 root). xfs_io -f -c "pwrite -b 1M 0 128M" $MNT/foo
btrfs subvolume snapshot $MNT $MNT/snap
# Fiemap should output 0x2008 in the flags column. # 0x2000 means shared extent # 0x8 means encoded extent (because it's compressed) echo echo "fiemap after snapshot, range [120M, 120M + 128K):" xfs_io -c "fiemap -v 120M 128K" $MNT/foo echo
# Overwrite one extent and fsync to flush delalloc and COW a new path # in the snapshot's tree. # # After this we have a BTRFS_DROP_DELAYED_REF delayed ref of type # BTRFS_TREE_BLOCK_REF_KEY with a count of -1 for every COWed extent # buffer in the path. # # In the extent tree we have inline references of type # BTRFS_TREE_BLOCK_REF_KEY, with a count of 1, for the same extent # buffers, so they should cancel each other, and the extent buffers in # the fs tree should no longer be considered as shared. # echo "Overwriting file range [120M, 120M + 128K)..." xfs_io -c "pwrite -b 128K 120M 128K" $MNT/snap/foo xfs_io -c "fsync" $MNT/snap/foo
# Fiemap should output 0x8 in the flags column. The extent in the range # [120M, 120M + 128K) is no longer shared, it's now exclusive to the fs # tree. echo echo "fiemap after overwrite range [120M, 120M + 128K):" xfs_io -c "fiemap -v 120M 128K" $MNT/foo echo
umount $MNT
Running it before this patch:
$ ./test.sh (...) wrote 134217728/134217728 bytes at offset 0 128 MiB, 128 ops; 0.1152 sec (1.085 GiB/sec and 1110.5809 ops/sec) Create a snapshot of '/mnt/sdj' in '/mnt/sdj/snap'
fiemap after snapshot, range [120M, 120M + 128K): /mnt/sdj/foo: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [245760..246015]: 34304..34559 256 0x2008
Overwriting file range [120M, 120M + 128K)... wrote 131072/131072 bytes at offset 125829120 128 KiB, 1 ops; 0.0001 sec (683.060 MiB/sec and 5464.4809 ops/sec)
fiemap after overwrite range [120M, 120M + 128K): /mnt/sdj/foo: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [245760..246015]: 34304..34559 256 0x2008
The extent in the range [120M, 120M + 128K) is still reported as shared (0x2000 bit set) after overwriting that range and flushing delalloc, which is not correct - an entire path was COWed in the snapshot's tree and the extent is now only referenced by the original fs tree.
Running it after this patch:
$ ./test.sh (...) wrote 134217728/134217728 bytes at offset 0 128 MiB, 128 ops; 0.1198 sec (1.043 GiB/sec and 1068.2067 ops/sec) Create a snapshot of '/mnt/sdj' in '/mnt/sdj/snap'
fiemap after snapshot, range [120M, 120M + 128K): /mnt/sdj/foo: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [245760..246015]: 34304..34559 256 0x2008
Overwriting file range [120M, 120M + 128K)... wrote 131072/131072 bytes at offset 125829120 128 KiB, 1 ops; 0.0001 sec (694.444 MiB/sec and 5555.5556 ops/sec)
fiemap after overwrite range [120M, 120M + 128K): /mnt/sdj/foo: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [245760..246015]: 34304..34559 256 0x8
Now the extent is not reported as shared anymore.
So fix this by passing a NULL key pointer to add_indirect_ref() when processing a delayed reference for a tree block if there's no extent op for our delayed ref head with a defined key. Also access the extent op only after locking the delayed ref head's lock.
The reproducer will be converted later to a test case for fstests.
Fixes: 86d5f994425252 ("btrfs: convert prelimary reference tracking to use rbtrees") Fixes: a6dbceafb915e8 ("btrfs: Remove unused op_key var from add_delayed_refs") Signed-off-by: Filipe Manana fdmanana@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/btrfs/backref.c | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-)
diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c index 9c969b90aec4..7147bb66a482 100644 --- a/fs/btrfs/backref.c +++ b/fs/btrfs/backref.c @@ -813,16 +813,11 @@ static int add_delayed_refs(const struct btrfs_fs_info *fs_info, struct preftrees *preftrees, struct share_check *sc) { struct btrfs_delayed_ref_node *node; - struct btrfs_delayed_extent_op *extent_op = head->extent_op; struct btrfs_key key; - struct btrfs_key tmp_op_key; struct rb_node *n; int count; int ret = 0;
- if (extent_op && extent_op->update_key) - btrfs_disk_key_to_cpu(&tmp_op_key, &extent_op->key); - spin_lock(&head->lock); for (n = rb_first_cached(&head->ref_tree); n; n = rb_next(n)) { node = rb_entry(n, struct btrfs_delayed_ref_node, @@ -848,10 +843,16 @@ static int add_delayed_refs(const struct btrfs_fs_info *fs_info, case BTRFS_TREE_BLOCK_REF_KEY: { /* NORMAL INDIRECT METADATA backref */ struct btrfs_delayed_tree_ref *ref; + struct btrfs_key *key_ptr = NULL; + + if (head->extent_op && head->extent_op->update_key) { + btrfs_disk_key_to_cpu(&key, &head->extent_op->key); + key_ptr = &key; + }
ref = btrfs_delayed_node_to_tree_ref(node); ret = add_indirect_ref(fs_info, preftrees, ref->root, - &tmp_op_key, ref->level + 1, + key_ptr, ref->level + 1, node->bytenr, count, sc, GFP_ATOMIC); break;
From: Tony Luck tony.luck@intel.com
[ Upstream commit f6ec01da40e4139b41179f046044ee7c4f6370dc ]
If there is no user space consumer of extlog_mem trace records, then Linux properly handles multiple error records in an ELOG block
extlog_print() print_extlog_rcd() __print_extlog_rcd() cper_estatus_print() apei_estatus_for_each_section()
But the other code path hard codes looking for a single record to output a trace record.
Fix by using the same apei_estatus_for_each_section() iterator to step over all records.
Fixes: 2dfb7d51a61d ("trace, RAS: Add eMCA trace event interface") Signed-off-by: Tony Luck tony.luck@intel.com Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/acpi/acpi_extlog.c | 33 ++++++++++++++++++++------------- 1 file changed, 20 insertions(+), 13 deletions(-)
diff --git a/drivers/acpi/acpi_extlog.c b/drivers/acpi/acpi_extlog.c index 91d0b0fc392b..4c05c3828c9e 100644 --- a/drivers/acpi/acpi_extlog.c +++ b/drivers/acpi/acpi_extlog.c @@ -12,6 +12,7 @@ #include <linux/ratelimit.h> #include <linux/edac.h> #include <linux/ras.h> +#include <acpi/ghes.h> #include <asm/cpu.h> #include <asm/mce.h>
@@ -140,8 +141,8 @@ static int extlog_print(struct notifier_block *nb, unsigned long val, int cpu = mce->extcpu; struct acpi_hest_generic_status *estatus, *tmp; struct acpi_hest_generic_data *gdata; - const guid_t *fru_id = &guid_null; - char *fru_text = ""; + const guid_t *fru_id; + char *fru_text; guid_t *sec_type; static u32 err_seq;
@@ -162,17 +163,23 @@ static int extlog_print(struct notifier_block *nb, unsigned long val,
/* log event via trace */ err_seq++; - gdata = (struct acpi_hest_generic_data *)(tmp + 1); - if (gdata->validation_bits & CPER_SEC_VALID_FRU_ID) - fru_id = (guid_t *)gdata->fru_id; - if (gdata->validation_bits & CPER_SEC_VALID_FRU_TEXT) - fru_text = gdata->fru_text; - sec_type = (guid_t *)gdata->section_type; - if (guid_equal(sec_type, &CPER_SEC_PLATFORM_MEM)) { - struct cper_sec_mem_err *mem = (void *)(gdata + 1); - if (gdata->error_data_length >= sizeof(*mem)) - trace_extlog_mem_event(mem, err_seq, fru_id, fru_text, - (u8)gdata->error_severity); + apei_estatus_for_each_section(tmp, gdata) { + if (gdata->validation_bits & CPER_SEC_VALID_FRU_ID) + fru_id = (guid_t *)gdata->fru_id; + else + fru_id = &guid_null; + if (gdata->validation_bits & CPER_SEC_VALID_FRU_TEXT) + fru_text = gdata->fru_text; + else + fru_text = ""; + sec_type = (guid_t *)gdata->section_type; + if (guid_equal(sec_type, &CPER_SEC_PLATFORM_MEM)) { + struct cper_sec_mem_err *mem = (void *)(gdata + 1); + + if (gdata->error_data_length >= sizeof(*mem)) + trace_extlog_mem_event(mem, err_seq, fru_id, fru_text, + (u8)gdata->error_severity); + } }
out:
From: Mark Tomlinson mark.tomlinson@alliedtelesis.co.nz
[ Upstream commit 28be7ca4fcfd69a2d52aaa331adbf9dbe91f9e6e ]
The trial period exists until jiffies is after addr_trial_end. But as jiffies will eventually overflow, just using time_after will eventually give incorrect results. As the node address is set once the trial period ends, this can be used to know that we are not in the trial period.
Fixes: e415577f57f4 ("tipc: correct discovery message handling during address trial period") Signed-off-by: Mark Tomlinson mark.tomlinson@alliedtelesis.co.nz Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/tipc/discover.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/tipc/discover.c b/net/tipc/discover.c index c138d68e8a69..0006c9f87199 100644 --- a/net/tipc/discover.c +++ b/net/tipc/discover.c @@ -146,8 +146,8 @@ static bool tipc_disc_addr_trial_msg(struct tipc_discoverer *d, { struct net *net = d->net; struct tipc_net *tn = tipc_net(net); - bool trial = time_before(jiffies, tn->addr_trial_end); u32 self = tipc_own_addr(net); + bool trial = time_before(jiffies, tn->addr_trial_end) && !self;
if (mtyp == DSC_TRIAL_FAIL_MSG) { if (!trial)
From: Alexander Potapenko glider@google.com
[ Upstream commit 777ecaabd614d47c482a5c9031579e66da13989a ]
Use a 8-byte write to initialize sub.usr_handle in tipc_topsrv_kern_subscr(), otherwise four bytes remain uninitialized when issuing setsockopt(..., SOL_TIPC, ...). This resulted in an infoleak reported by KMSAN when the packet was received:
===================================================== BUG: KMSAN: kernel-infoleak in copyout+0xbc/0x100 lib/iov_iter.c:169 instrument_copy_to_user ./include/linux/instrumented.h:121 copyout+0xbc/0x100 lib/iov_iter.c:169 _copy_to_iter+0x5c0/0x20a0 lib/iov_iter.c:527 copy_to_iter ./include/linux/uio.h:176 simple_copy_to_iter+0x64/0xa0 net/core/datagram.c:513 __skb_datagram_iter+0x123/0xdc0 net/core/datagram.c:419 skb_copy_datagram_iter+0x58/0x200 net/core/datagram.c:527 skb_copy_datagram_msg ./include/linux/skbuff.h:3903 packet_recvmsg+0x521/0x1e70 net/packet/af_packet.c:3469 ____sys_recvmsg+0x2c4/0x810 net/socket.c:? ___sys_recvmsg+0x217/0x840 net/socket.c:2743 __sys_recvmsg net/socket.c:2773 __do_sys_recvmsg net/socket.c:2783 __se_sys_recvmsg net/socket.c:2780 __x64_sys_recvmsg+0x364/0x540 net/socket.c:2780 do_syscall_x64 arch/x86/entry/common.c:50 do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd arch/x86/entry/entry_64.S:120
...
Uninit was stored to memory at: tipc_sub_subscribe+0x42d/0xb50 net/tipc/subscr.c:156 tipc_conn_rcv_sub+0x246/0x620 net/tipc/topsrv.c:375 tipc_topsrv_kern_subscr+0x2e8/0x400 net/tipc/topsrv.c:579 tipc_group_create+0x4e7/0x7d0 net/tipc/group.c:190 tipc_sk_join+0x2a8/0x770 net/tipc/socket.c:3084 tipc_setsockopt+0xae5/0xe40 net/tipc/socket.c:3201 __sys_setsockopt+0x87f/0xdc0 net/socket.c:2252 __do_sys_setsockopt net/socket.c:2263 __se_sys_setsockopt net/socket.c:2260 __x64_sys_setsockopt+0xe0/0x160 net/socket.c:2260 do_syscall_x64 arch/x86/entry/common.c:50 do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd arch/x86/entry/entry_64.S:120
Local variable sub created at: tipc_topsrv_kern_subscr+0x57/0x400 net/tipc/topsrv.c:562 tipc_group_create+0x4e7/0x7d0 net/tipc/group.c:190
Bytes 84-87 of 88 are uninitialized Memory access of size 88 starts at ffff88801ed57cd0 Data copied to user address 0000000020000400 ... =====================================================
Signed-off-by: Alexander Potapenko glider@google.com Fixes: 026321c6d056a5 ("tipc: rename tipc_server to tipc_topsrv") Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/tipc/topsrv.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/tipc/topsrv.c b/net/tipc/topsrv.c index 444e1792d02c..b8797ff153e6 100644 --- a/net/tipc/topsrv.c +++ b/net/tipc/topsrv.c @@ -568,7 +568,7 @@ bool tipc_topsrv_kern_subscr(struct net *net, u32 port, u32 type, u32 lower, sub.seq.upper = upper; sub.timeout = TIPC_WAIT_FOREVER; sub.filter = filter; - *(u32 *)&sub.usr_handle = port; + *(u64 *)&sub.usr_handle = (u64)port;
con = tipc_conn_alloc(tipc_topsrv(net)); if (IS_ERR(con))
From: José Expósito jose.exposito89@gmail.com
[ Upstream commit bb5f0c855dcfc893ae5ed90e4c646bde9e4498bf ]
Under certain conditions the Magic Trackpad can group 2 reports in a single packet. The packet is split and the raw event function is invoked recursively for each part.
However, after processing each part, the BTN_MOUSE status is updated, sending multiple click events. [1]
Return after processing double reports to avoid this issue.
Link: https://gitlab.freedesktop.org/libinput/libinput/-/issues/811 # [1] Fixes: a462230e16ac ("HID: magicmouse: enable Magic Trackpad support") Reported-by: Nulo git@nulo.in Signed-off-by: José Expósito jose.exposito89@gmail.com Signed-off-by: Benjamin Tissoires benjamin.tissoires@redhat.com Link: https://lore.kernel.org/r/20221009182747.90730-1-jose.exposito89@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/hid/hid-magicmouse.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/hid/hid-magicmouse.c b/drivers/hid/hid-magicmouse.c index fc4c07459753..28158d2f2352 100644 --- a/drivers/hid/hid-magicmouse.c +++ b/drivers/hid/hid-magicmouse.c @@ -387,7 +387,7 @@ static int magicmouse_raw_event(struct hid_device *hdev, magicmouse_raw_event(hdev, report, data + 2, data[1]); magicmouse_raw_event(hdev, report, data + 2 + data[1], size - 2 - data[1]); - break; + return 0; default: return 0; }
From: Xiaobo Liu cppcoffee@gmail.com
[ Upstream commit d8bde3bf7f82dac5fc68a62c2816793a12cafa2a ]
Then the input contains '\0' or '\n', proc_mpc_write has read them, so the return value needs +1.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Xiaobo Liu cppcoffee@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/atm/mpoa_proc.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/net/atm/mpoa_proc.c b/net/atm/mpoa_proc.c index 46d6cd9a36ae..c4e9538ac144 100644 --- a/net/atm/mpoa_proc.c +++ b/net/atm/mpoa_proc.c @@ -222,11 +222,12 @@ static ssize_t proc_mpc_write(struct file *file, const char __user *buff, if (!page) return -ENOMEM;
- for (p = page, len = 0; len < nbytes; p++, len++) { + for (p = page, len = 0; len < nbytes; p++) { if (get_user(*p, buff++)) { free_page((unsigned long)page); return -EFAULT; } + len += 1; if (*p == '\0' || *p == '\n') break; }
From: Harini Katakam harini.katakam@amd.com
[ Upstream commit 0c9efbd5c50c64ead434960a404c9c9a097b0403 ]
When RX strap in HW is not set to MODE 3 or 4, bit 7 and 8 in CF4 register should be set. The former is already handled in dp83867_config_init; add the latter in SGMII specific initialization.
Fixes: 2a10154abcb7 ("net: phy: dp83867: Add TI dp83867 phy") Signed-off-by: Harini Katakam harini.katakam@amd.com Reviewed-by: Andrew Lunn andrew@lunn.ch Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/phy/dp83867.c | 8 ++++++++ 1 file changed, 8 insertions(+)
diff --git a/drivers/net/phy/dp83867.c b/drivers/net/phy/dp83867.c index 87c0cdbf262a..c7d91415a436 100644 --- a/drivers/net/phy/dp83867.c +++ b/drivers/net/phy/dp83867.c @@ -432,6 +432,14 @@ static int dp83867_config_init(struct phy_device *phydev) else val &= ~DP83867_SGMII_TYPE; phy_write_mmd(phydev, DP83867_DEVADDR, DP83867_SGMIICTL, val); + + /* This is a SW workaround for link instability if RX_CTRL is + * not strapped to mode 3 or 4 in HW. This is required for SGMII + * in addition to clearing bit 7, handled above. + */ + if (dp83867->rxctrl_strap_quirk) + phy_set_bits_mmd(phydev, DP83867_DEVADDR, DP83867_CFG4, + BIT(8)); }
val = phy_read(phydev, DP83867_CFG3);
From: Zhengchao Shao shaozhengchao@huawei.com
[ Upstream commit 51f9a8921ceacd7bf0d3f47fa867a64988ba1dcb ]
When the default qdisc is cake, if the qdisc of dev_queue fails to be inited during mqprio_init(), cake_reset() is invoked to clear resources. In this case, the tins is NULL, and it will cause gpf issue.
The process is as follows: qdisc_create_dflt() cake_init() q->tins = kvcalloc(...) --->failed, q->tins is NULL ... qdisc_put() ... cake_reset() ... cake_dequeue_one() b = &q->tins[...] --->q->tins is NULL
The following is the Call Trace information: general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007] RIP: 0010:cake_dequeue_one+0xc9/0x3c0 Call Trace: <TASK> cake_reset+0xb1/0x140 qdisc_reset+0xed/0x6f0 qdisc_destroy+0x82/0x4c0 qdisc_put+0x9e/0xb0 qdisc_create_dflt+0x2c3/0x4a0 mqprio_init+0xa71/0x1760 qdisc_create+0x3eb/0x1000 tc_modify_qdisc+0x408/0x1720 rtnetlink_rcv_msg+0x38e/0xac0 netlink_rcv_skb+0x12d/0x3a0 netlink_unicast+0x4a2/0x740 netlink_sendmsg+0x826/0xcc0 sock_sendmsg+0xc5/0x100 ____sys_sendmsg+0x583/0x690 ___sys_sendmsg+0xe8/0x160 __sys_sendmsg+0xbf/0x160 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x46/0xb0 RIP: 0033:0x7f89e5122d04 </TASK>
Fixes: 046f6fd5daef ("sched: Add Common Applications Kept Enhanced (cake) qdisc") Signed-off-by: Zhengchao Shao shaozhengchao@huawei.com Acked-by: Toke Høiland-Jørgensen toke@toke.dk Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/sched/sch_cake.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/net/sched/sch_cake.c b/net/sched/sch_cake.c index 0eb4d4a568f7..9e5e7fda0f4a 100644 --- a/net/sched/sch_cake.c +++ b/net/sched/sch_cake.c @@ -2190,8 +2190,12 @@ static struct sk_buff *cake_dequeue(struct Qdisc *sch)
static void cake_reset(struct Qdisc *sch) { + struct cake_sched_data *q = qdisc_priv(sch); u32 c;
+ if (!q->tins) + return; + for (c = 0; c < CAKE_MAX_TINS; c++) cake_clear_tin(sch, c); }
From: Yang Yingliang yangyingliang@huawei.com
[ Upstream commit ff2f5ec5d009844ec28f171123f9e58750cef4bf ]
Inject fault while probing module, if device_register() fails, but the refcount of kobject is not decreased to 0, the name allocated in dev_set_name() is leaked. Fix this by calling put_device(), so that name can be freed in callback function kobject_cleanup().
unreferenced object 0xffff00c01aba2100 (size 128): comm "systemd-udevd", pid 1259, jiffies 4294903284 (age 294.152s) hex dump (first 32 bytes): 68 6e 61 65 30 00 00 00 18 21 ba 1a c0 00 ff ff hnae0....!...... 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<0000000034783f26>] slab_post_alloc_hook+0xa0/0x3e0 [<00000000748188f2>] __kmem_cache_alloc_node+0x164/0x2b0 [<00000000ab0743e8>] __kmalloc_node_track_caller+0x6c/0x390 [<000000006c0ffb13>] kvasprintf+0x8c/0x118 [<00000000fa27bfe1>] kvasprintf_const+0x60/0xc8 [<0000000083e10ed7>] kobject_set_name_vargs+0x3c/0xc0 [<000000000b87affc>] dev_set_name+0x7c/0xa0 [<000000003fd8fe26>] hnae_ae_register+0xcc/0x190 [hnae] [<00000000fe97edc9>] hns_dsaf_ae_init+0x9c/0x108 [hns_dsaf] [<00000000c36ff1eb>] hns_dsaf_probe+0x548/0x748 [hns_dsaf]
Fixes: 6fe6611ff275 ("net: add Hisilicon Network Subsystem hnae framework support") Signed-off-by: Yang Yingliang yangyingliang@huawei.com Reviewed-by: Leon Romanovsky leonro@nvidia.com Link: https://lore.kernel.org/r/20221018122451.1749171-1-yangyingliang@huawei.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/hisilicon/hns/hnae.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/hisilicon/hns/hnae.c b/drivers/net/ethernet/hisilicon/hns/hnae.c index 08339278c722..7c838a75934e 100644 --- a/drivers/net/ethernet/hisilicon/hns/hnae.c +++ b/drivers/net/ethernet/hisilicon/hns/hnae.c @@ -419,8 +419,10 @@ int hnae_ae_register(struct hnae_ae_dev *hdev, struct module *owner) hdev->cls_dev.release = hnae_release; (void)dev_set_name(&hdev->cls_dev, "hnae%d", hdev->id); ret = device_register(&hdev->cls_dev); - if (ret) + if (ret) { + put_device(&hdev->cls_dev); return ret; + }
__module_get(THIS_MODULE);
From: Jerry Snitselaar jsnitsel@redhat.com
[ Upstream commit 620bf9f981365c18cc2766c53d92bf8131c63f32 ]
A splat from kmem_cache_destroy() was seen with a kernel prior to commit ee2653bbe89d ("iommu/vt-d: Remove domain and devinfo mempool") when there was a failure in init_dmars(), because the iommu_domain cache still had objects. While the mempool code is now gone, there still is a leak of the si_domain memory if init_dmars() fails. So clean up si_domain in the init_dmars() error path.
Cc: Lu Baolu baolu.lu@linux.intel.com Cc: Joerg Roedel joro@8bytes.org Cc: Will Deacon will@kernel.org Cc: Robin Murphy robin.murphy@arm.com Fixes: 86080ccc223a ("iommu/vt-d: Allocate si_domain in init_dmars()") Signed-off-by: Jerry Snitselaar jsnitsel@redhat.com Link: https://lore.kernel.org/r/20221010144842.308890-1-jsnitsel@redhat.com Signed-off-by: Lu Baolu baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel jroedel@suse.de Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/iommu/intel-iommu.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index a2a03df97704..ff120d7ed342 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -2751,6 +2751,7 @@ static int __init si_domain_init(int hw)
if (md_domain_init(si_domain, DEFAULT_DOMAIN_ADDRESS_WIDTH)) { domain_exit(si_domain); + si_domain = NULL; return -EFAULT; }
@@ -3371,6 +3372,10 @@ static int __init init_dmars(void) disable_dmar_iommu(iommu); free_dmar_iommu(iommu); } + if (si_domain) { + domain_exit(si_domain); + si_domain = NULL; + }
kfree(g_iommus);
From: Conor Dooley conor.dooley@microchip.com
commit 456797da792fa7cbf6698febf275fe9b36691f78 upstream.
arm64's method of defining a default cpu topology requires only minimal changes to apply to RISC-V also. The current arm64 implementation exits early in a uniprocessor configuration by reading MPIDR & claiming that uniprocessor can rely on the default values.
This is appears to be a hangover from prior to '3102bc0e6ac7 ("arm64: topology: Stop using MPIDR for topology information")', because the current code just assigns default values for multiprocessor systems.
With the MPIDR references removed, store_cpu_topolgy() can be moved to the common arch_topology code.
Reviewed-by: Sudeep Holla sudeep.holla@arm.com Acked-by: Catalin Marinas catalin.marinas@arm.com Reviewed-by: Atish Patra atishp@rivosinc.com Signed-off-by: Conor Dooley conor.dooley@microchip.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/kernel/topology.c | 40 ---------------------------------------- drivers/base/arch_topology.c | 19 +++++++++++++++++++ 2 files changed, 19 insertions(+), 40 deletions(-)
--- a/arch/arm64/kernel/topology.c +++ b/arch/arm64/kernel/topology.c @@ -21,46 +21,6 @@ #include <asm/cputype.h> #include <asm/topology.h>
-void store_cpu_topology(unsigned int cpuid) -{ - struct cpu_topology *cpuid_topo = &cpu_topology[cpuid]; - u64 mpidr; - - if (cpuid_topo->package_id != -1) - goto topology_populated; - - mpidr = read_cpuid_mpidr(); - - /* Uniprocessor systems can rely on default topology values */ - if (mpidr & MPIDR_UP_BITMASK) - return; - - /* - * This would be the place to create cpu topology based on MPIDR. - * - * However, it cannot be trusted to depict the actual topology; some - * pieces of the architecture enforce an artificial cap on Aff0 values - * (e.g. GICv3's ICC_SGI1R_EL1 limits it to 15), leading to an - * artificial cycling of Aff1, Aff2 and Aff3 values. IOW, these end up - * having absolutely no relationship to the actual underlying system - * topology, and cannot be reasonably used as core / package ID. - * - * If the MT bit is set, Aff0 *could* be used to define a thread ID, but - * we still wouldn't be able to obtain a sane core ID. This means we - * need to entirely ignore MPIDR for any topology deduction. - */ - cpuid_topo->thread_id = -1; - cpuid_topo->core_id = cpuid; - cpuid_topo->package_id = cpu_to_node(cpuid); - - pr_debug("CPU%u: cluster %d core %d thread %d mpidr %#016llx\n", - cpuid, cpuid_topo->package_id, cpuid_topo->core_id, - cpuid_topo->thread_id, mpidr); - -topology_populated: - update_siblings_masks(cpuid); -} - #ifdef CONFIG_ACPI static bool __init acpi_cpu_is_threaded(int cpu) { --- a/drivers/base/arch_topology.c +++ b/drivers/base/arch_topology.c @@ -538,4 +538,23 @@ void __init init_cpu_topology(void) else if (of_have_populated_dt() && parse_dt_topology()) reset_cpu_topology(); } + +void store_cpu_topology(unsigned int cpuid) +{ + struct cpu_topology *cpuid_topo = &cpu_topology[cpuid]; + + if (cpuid_topo->package_id != -1) + goto topology_populated; + + cpuid_topo->thread_id = -1; + cpuid_topo->core_id = cpuid; + cpuid_topo->package_id = cpu_to_node(cpuid); + + pr_debug("CPU%u: package %d core %d thread %d\n", + cpuid, cpuid_topo->package_id, cpuid_topo->core_id, + cpuid_topo->thread_id); + +topology_populated: + update_siblings_masks(cpuid); +} #endif
From: Conor Dooley conor.dooley@microchip.com
commit fbd92809997a391f28075f1c8b5ee314c225557c upstream.
RISC-V has no sane defaults to fall back on where there is no cpu-map in the devicetree. Without sane defaults, the package, core and thread IDs are all set to -1. This causes user-visible inaccuracies for tools like hwloc/lstopo which rely on the sysfs cpu topology files to detect a system's topology.
On a PolarFire SoC, which should have 4 harts with a thread each, lstopo currently reports:
Machine (793MB total) Package L#0 NUMANode L#0 (P#0 793MB) Core L#0 L1d L#0 (32KB) + L1i L#0 (32KB) + PU L#0 (P#0) L1d L#1 (32KB) + L1i L#1 (32KB) + PU L#1 (P#1) L1d L#2 (32KB) + L1i L#2 (32KB) + PU L#2 (P#2) L1d L#3 (32KB) + L1i L#3 (32KB) + PU L#3 (P#3)
Adding calls to store_cpu_topology() in {boot,smp} hart bringup code results in the correct topolgy being reported:
Machine (793MB total) Package L#0 NUMANode L#0 (P#0 793MB) L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0 + PU L#0 (P#0) L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1 + PU L#1 (P#1) L1d L#2 (32KB) + L1i L#2 (32KB) + Core L#2 + PU L#2 (P#2) L1d L#3 (32KB) + L1i L#3 (32KB) + Core L#3 + PU L#3 (P#3)
CC: stable@vger.kernel.org # 456797da792f: arm64: topology: move store_cpu_topology() to shared code Fixes: 03f11f03dbfe ("RISC-V: Parse cpu topology during boot.") Reported-by: Brice Goglin Brice.Goglin@inria.fr Link: https://github.com/open-mpi/hwloc/issues/536 Reviewed-by: Sudeep Holla sudeep.holla@arm.com Reviewed-by: Atish Patra atishp@rivosinc.com Signed-off-by: Conor Dooley conor.dooley@microchip.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/riscv/Kconfig | 2 +- arch/riscv/kernel/smpboot.c | 4 +++- 2 files changed, 4 insertions(+), 2 deletions(-)
--- a/arch/riscv/Kconfig +++ b/arch/riscv/Kconfig @@ -51,7 +51,7 @@ config RISCV select PCI_MSI if PCI select RISCV_TIMER select GENERIC_IRQ_MULTI_HANDLER - select GENERIC_ARCH_TOPOLOGY if SMP + select GENERIC_ARCH_TOPOLOGY select ARCH_HAS_PTE_SPECIAL select ARCH_HAS_MMIOWB select HAVE_EBPF_JIT if 64BIT --- a/arch/riscv/kernel/smpboot.c +++ b/arch/riscv/kernel/smpboot.c @@ -46,6 +46,8 @@ void __init smp_prepare_cpus(unsigned in { int cpuid;
+ store_cpu_topology(smp_processor_id()); + /* This covers non-smp usecase mandated by "nosmp" option */ if (max_cpus == 0) return; @@ -142,8 +144,8 @@ asmlinkage __visible void __init smp_cal current->active_mm = mm;
trap_init(); + store_cpu_topology(smp_processor_id()); notify_cpu_starting(smp_processor_id()); - update_siblings_masks(smp_processor_id()); set_cpu_online(smp_processor_id(), 1); /* * Remote TLB flushes are ignored while the CPU is offline, so emit
From: Werner Sembach wse@tuxedocomputers.com
commit 3dbc80a3e4c55c4a5b89ef207bed7b7de36157b4 upstream.
This commit is very different from the upstream commit! It fixes the same issue by adding more quirks, rather then the general fix from the 6.1 kernel, because the general fix from the 6.1 kernel is part of a larger refactoring of the backlight code which is not suitable for the stable series.
As described in "ACPI: video: Drop NL5x?U, PF4NU1F and PF5?U?? acpi_backlight=native quirks" (10212754a0d2) the upstream commit "ACPI: video: Make backlight class device registration a separate step (v2)" (3dbc80a3e4c5) makes these quirks unnecessary. However as mentioned in this bugtracker ticket https://bugzilla.kernel.org/show_bug.cgi?id=215683#c17 the upstream fix is part of a larger patchset that is overall too complex for stable.
The TongFang GKxNRxx, GMxNGxx, GMxZGxx, and GMxRGxx / TUXEDO Stellaris/Polaris Gen 1-4, have the same problem as the Clevo NL5xRU and NL5xNU / TUXEDO Aura 15 Gen1 and Gen2: They have a working native and video interface for screen backlight. However the default detection mechanism first registers the video interface before unregistering it again and switching to the native interface during boot. This results in a dangling SBIOS request for backlight change for some reason, causing the backlight to switch to ~2% once per boot on the first power cord connect or disconnect event. Setting the native interface explicitly circumvents this buggy behaviour by avoiding the unregistering process.
Reviewed-by: Hans de Goede hdegoede@redhat.com Signed-off-by: Werner Sembach wse@tuxedocomputers.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/acpi/video_detect.c | 64 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 64 insertions(+)
--- a/drivers/acpi/video_detect.c +++ b/drivers/acpi/video_detect.c @@ -464,6 +464,70 @@ static const struct dmi_system_id video_ }, }, /* + * More Tongfang devices with the same issue as the Clevo NL5xRU and + * NL5xNU/TUXEDO Aura 15 Gen1 and Gen2. See the description above. + */ + { + .callback = video_detect_force_native, + .ident = "TongFang GKxNRxx", + .matches = { + DMI_MATCH(DMI_BOARD_NAME, "GKxNRxx"), + }, + }, + { + .callback = video_detect_force_native, + .ident = "TongFang GKxNRxx", + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "TUXEDO"), + DMI_MATCH(DMI_BOARD_NAME, "POLARIS1501A1650TI"), + }, + }, + { + .callback = video_detect_force_native, + .ident = "TongFang GKxNRxx", + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "TUXEDO"), + DMI_MATCH(DMI_BOARD_NAME, "POLARIS1501A2060"), + }, + }, + { + .callback = video_detect_force_native, + .ident = "TongFang GKxNRxx", + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "TUXEDO"), + DMI_MATCH(DMI_BOARD_NAME, "POLARIS1701A1650TI"), + }, + }, + { + .callback = video_detect_force_native, + .ident = "TongFang GKxNRxx", + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "TUXEDO"), + DMI_MATCH(DMI_BOARD_NAME, "POLARIS1701A2060"), + }, + }, + { + .callback = video_detect_force_native, + .ident = "TongFang GMxNGxx", + .matches = { + DMI_MATCH(DMI_BOARD_NAME, "GMxNGxx"), + }, + }, + { + .callback = video_detect_force_native, + .ident = "TongFang GMxZGxx", + .matches = { + DMI_MATCH(DMI_BOARD_NAME, "GMxZGxx"), + }, + }, + { + .callback = video_detect_force_native, + .ident = "TongFang GMxRGxx", + .matches = { + DMI_MATCH(DMI_BOARD_NAME, "GMxRGxx"), + }, + }, + /* * Desktops which falsely report a backlight and which our heuristics * for this do not catch. */
From: Nick Desaulniers ndesaulniers@google.com
This is _not_ an upstream commit and just for 5.4.y only. It is based on commit 32ef9e5054ec0321b9336058c58ec749e9c6b0fe upstream.
Alexey reported that the fraction of unknown filename instances in kallsyms grew from ~0.3% to ~10% recently; Bill and Greg tracked it down to assembler defined symbols, which regressed as a result of:
commit b8a9092330da ("Kbuild: do not emit debug info for assembly with LLVM_IAS=1")
In that commit, I allude to restoring debug info for assembler defined symbols in a follow up patch, but it seems I forgot to do so in
commit a66049e2cf0e ("Kbuild: make DWARF version a choice")
Fixes: b8a9092330da ("Kbuild: do not emit debug info for assembly with LLVM_IAS=1") Signed-off-by: Nick Desaulniers ndesaulniers@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- Makefile | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/Makefile +++ b/Makefile @@ -802,7 +802,9 @@ DEBUG_CFLAGS += -gsplit-dwarf else DEBUG_CFLAGS += -g endif -ifneq ($(LLVM_IAS),1) +ifeq ($(LLVM_IAS),1) +KBUILD_AFLAGS += -g +else KBUILD_AFLAGS += -Wa,-gdwarf-2 endif endif
From: Gaurav Kohli gauravkohli@linux.microsoft.com
commit 365e1ececb2905f94cc10a5817c5b644a32a3ae2 upstream.
During vm boot, there might be possibility that vf registration call comes before the vf association from host to vm.
And this might break netvsc vf path, To prevent the same block vf registration until vf bind message comes from host.
Cc: stable@vger.kernel.org Fixes: 00d7ddba11436 ("hv_netvsc: pair VF based on serial number") Reviewed-by: Haiyang Zhang haiyangz@microsoft.com Signed-off-by: Gaurav Kohli gauravkohli@linux.microsoft.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/hyperv/hyperv_net.h | 3 +++ drivers/net/hyperv/netvsc.c | 4 ++++ drivers/net/hyperv/netvsc_drv.c | 20 ++++++++++++++++++++ 3 files changed, 27 insertions(+)
--- a/drivers/net/hyperv/hyperv_net.h +++ b/drivers/net/hyperv/hyperv_net.h @@ -954,6 +954,9 @@ struct net_device_context { u32 vf_alloc; /* Serial number of the VF to team with */ u32 vf_serial; + + /* completion variable to confirm vf association */ + struct completion vf_add; };
/* Per channel data */ --- a/drivers/net/hyperv/netvsc.c +++ b/drivers/net/hyperv/netvsc.c @@ -1223,6 +1223,10 @@ static void netvsc_send_vf(struct net_de
net_device_ctx->vf_alloc = nvmsg->msg.v4_msg.vf_assoc.allocated; net_device_ctx->vf_serial = nvmsg->msg.v4_msg.vf_assoc.serial; + + if (net_device_ctx->vf_alloc) + complete(&net_device_ctx->vf_add); + netdev_info(ndev, "VF slot %u %s\n", net_device_ctx->vf_serial, net_device_ctx->vf_alloc ? "added" : "removed"); --- a/drivers/net/hyperv/netvsc_drv.c +++ b/drivers/net/hyperv/netvsc_drv.c @@ -2133,6 +2133,7 @@ static struct net_device *get_netvsc_bys { struct device *parent = vf_netdev->dev.parent; struct net_device_context *ndev_ctx; + struct net_device *ndev; struct pci_dev *pdev; u32 serial;
@@ -2159,6 +2160,18 @@ static struct net_device *get_netvsc_bys return hv_get_drvdata(ndev_ctx->device_ctx); }
+ /* Fallback path to check synthetic vf with + * help of mac addr + */ + list_for_each_entry(ndev_ctx, &netvsc_dev_list, list) { + ndev = hv_get_drvdata(ndev_ctx->device_ctx); + if (ether_addr_equal(vf_netdev->perm_addr, ndev->perm_addr)) { + netdev_notice(vf_netdev, + "falling back to mac addr based matching\n"); + return ndev; + } + } + netdev_notice(vf_netdev, "no netdev found for vf serial:%u\n", serial); return NULL; @@ -2232,6 +2245,11 @@ static int netvsc_vf_changed(struct net_ if (!netvsc_dev) return NOTIFY_DONE;
+ if (vf_is_up && !net_device_ctx->vf_alloc) { + netdev_info(ndev, "Waiting for the VF association from host\n"); + wait_for_completion(&net_device_ctx->vf_add); + } + netvsc_switch_datapath(ndev, vf_is_up); netdev_info(ndev, "Data path switched %s VF: %s\n", vf_is_up ? "to" : "from", vf_netdev->name); @@ -2253,6 +2271,7 @@ static int netvsc_unregister_vf(struct n
netdev_info(ndev, "VF unregistering: %s\n", vf_netdev->name);
+ reinit_completion(&net_device_ctx->vf_add); netdev_rx_handler_unregister(vf_netdev); netdev_upper_dev_unlink(vf_netdev, ndev); RCU_INIT_POINTER(net_device_ctx->vf_netdev, NULL); @@ -2290,6 +2309,7 @@ static int netvsc_probe(struct hv_device
INIT_DELAYED_WORK(&net_device_ctx->dwork, netvsc_link_change);
+ init_completion(&net_device_ctx->vf_add); spin_lock_init(&net_device_ctx->lock); INIT_LIST_HEAD(&net_device_ctx->reconfig_events); INIT_DELAYED_WORK(&net_device_ctx->vf_takeover, netvsc_vf_setup);
From: Seth Jenkins sethjenkins@google.com
Commit 258f669e7e88 ("mm: /proc/pid/smaps_rollup: convert to single value seq_file") introduced a null-deref if there are no vma's in the task in show_smaps_rollup.
Fixes: 258f669e7e88 ("mm: /proc/pid/smaps_rollup: convert to single value seq_file") Signed-off-by: Seth Jenkins sethjenkins@google.com Reviewed-by: Alexey Dobriyan adobriyan@gmail.com Tested-by: Alexey Dobriyan adobriyan@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/proc/task_mmu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -884,7 +884,7 @@ static int show_smaps_rollup(struct seq_ last_vma_end = vma->vm_end; }
- show_vma_header_prefix(m, priv->mm->mmap->vm_start, + show_vma_header_prefix(m, priv->mm->mmap ? priv->mm->mmap->vm_start : 0, last_vma_end, 0, 0, 0, 0); seq_pad(m, ' '); seq_puts(m, "[rollup]\n");
Hi Greg,
On Thu, Oct 27, 2022 at 06:55:48PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.4.221 release. There are 53 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sat, 29 Oct 2022 16:50:35 +0000. Anything received after that time might be too late.
Build test (gcc version 11.3.1 20221016): mips: 65 configs -> no failure arm: 106 configs -> no failure arm64: 2 configs -> no failure x86_64: 4 configs -> no failure alpha allmodconfig -> no failure powerpc allmodconfig -> no failure riscv allmodconfig -> no failure s390 allmodconfig -> no failure xtensa allmodconfig -> no failure
Boot test: x86_64: Booted on my test laptop. No regression. x86_64: Booted on qemu. No regression. [1]
[1]. https://openqa.qa.codethink.co.uk/tests/2046
Tested-by: Sudip Mukherjee sudip.mukherjee@codethink.co.uk
On Thu, 27 Oct 2022 at 22:38, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 5.4.221 release. There are 53 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sat, 29 Oct 2022 16:50:35 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.4.221-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.4.y and the diffstat can be found below.
thanks,
greg k-h
Results from Linaro's test farm. No regressions on arm64, arm, x86_64, and i386.
Tested-by: Linux Kernel Functional Testing lkft@linaro.org
## Build * kernel: 5.4.221-rc1 * git: https://gitlab.com/Linaro/lkft/mirrors/stable/linux-stable-rc * git branch: linux-5.4.y * git commit: f98a212520a532dcf91de4a5784c6b29b0da8874 * git describe: v5.4.220-54-gf98a212520a5 * test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-5.4.y/build/v5.4.22...
## No Test Regressions (compared to v5.4.219-256-gf49f12b65484)
## No Metric Regressions (compared to v5.4.219-256-gf49f12b65484)
## No Test Fixes (compared to v5.4.219-256-gf49f12b65484)
## No Metric Fixes (compared to v5.4.219-256-gf49f12b65484)
## Test result summary total: 117476, pass: 99333, fail: 2165, skip: 15484, xfail: 494
## Build Summary * arc: 10 total, 10 passed, 0 failed * arm: 334 total, 334 passed, 0 failed * arm64: 64 total, 59 passed, 5 failed * i386: 31 total, 29 passed, 2 failed * mips: 56 total, 56 passed, 0 failed * parisc: 12 total, 12 passed, 0 failed * powerpc: 63 total, 63 passed, 0 failed * riscv: 27 total, 26 passed, 1 failed * s390: 15 total, 15 passed, 0 failed * sh: 24 total, 24 passed, 0 failed * sparc: 12 total, 12 passed, 0 failed * x86_64: 57 total, 55 passed, 2 failed
## Test suites summary * fwts * igt-gpu-tools * kselftest-android * kselftest-arm64 * kselftest-arm64/arm64.btitest.bti_c_func * kselftest-arm64/arm64.btitest.bti_j_func * kselftest-arm64/arm64.btitest.bti_jc_func * kselftest-arm64/arm64.btitest.bti_none_func * kselftest-arm64/arm64.btitest.nohint_func * kselftest-arm64/arm64.btitest.paciasp_func * kselftest-arm64/arm64.nobtitest.bti_c_func * kselftest-arm64/arm64.nobtitest.bti_j_func * kselftest-arm64/arm64.nobtitest.bti_jc_func * kselftest-arm64/arm64.nobtitest.bti_none_func * kselftest-arm64/arm64.nobtitest.nohint_func * kselftest-arm64/arm64.nobtitest.paciasp_func * kselftest-breakpoints * kselftest-capabilities * kselftest-drivers-dma-buf * kselftest-efivarfs * kselftest-filesystems * kselftest-filesystems-binderfs * kselftest-firmware * kselftest-fpu * kselftest-futex * kselftest-gpio * kselftest-intel_pstate * kselftest-ipc * kselftest-ir * kselftest-kcmp * kselftest-kexec * kselftest-kvm * kselftest-lib * kselftest-livepatch * kselftest-membarrier * kselftest-memfd * kselftest-memory-hotplug * kselftest-mincore * kselftest-mount * kselftest-mqueue * kselftest-net * kselftest-net-forwarding * kselftest-netfilter * kselftest-nsfs * kselftest-openat2 * kselftest-pid_namespace * kselftest-pidfd * kselftest-proc * kselftest-pstore * kselftest-ptrace * kselftest-rseq * kselftest-rtc * kselftest-tc-testing * kselftest-timens * kselftest-timers * kselftest-tmpfs * kselftest-tpm2 * kselftest-user * kselftest-vm * kselftest-x86 * kselftest-zram * kunit * kvm-unit-tests * libgpiod * libhugetlbfs * log-parser-boot * log-parser-test * ltp-cap_bounds * ltp-commands * ltp-containers * ltp-controllers * ltp-cpuhotplug * ltp-crypto * ltp-cve * ltp-dio * ltp-fcntl-locktests * ltp-filecaps * ltp-fs * ltp-fs_bind * ltp-fs_perms_simple * ltp-fsx * ltp-hugetlb * ltp-io * ltp-ipc * ltp-math * ltp-mm * ltp-nptl * ltp-open-posix-tests * ltp-pty * ltp-sched * ltp-securebits * ltp-smoke * ltp-syscalls * ltp-tracing * network-basic-tests * packetdrill * perf * perf/Zstd-perf.data-compression * rcutorture * v4l2-compliance * vdso
-- Linaro LKFT https://lkft.linaro.org
On 10/27/22 09:55, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.4.221 release. There are 53 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sat, 29 Oct 2022 16:50:35 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.4.221-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.4.y and the diffstat can be found below.
thanks,
greg k-h
On ARCH_BRCMSTB using 32-bit and 64-bit ARM kernels, build tested on BMIPS_GENERIC:
Tested-by: Florian Fainelli f.fainelli@gmail.com
On Thu, Oct 27, 2022 at 06:55:48PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.4.221 release. There are 53 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sat, 29 Oct 2022 16:50:35 +0000. Anything received after that time might be too late.
Build results: total: 161 pass: 161 fail: 0 Qemu test results: total: 447 pass: 447 fail: 0
Tested-by: Guenter Roeck linux@roeck-us.net
Guenter
linux-stable-mirror@lists.linaro.org