March 2022 - Linux-stable-mirror

[PATCH] f2fs: fix to do sanity check on .cp_pack_total_block_count

by Chao Yu

As bughunter reported in bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=215709 f2fs may hang when mounting a fuzzed image, the dmesg shows as below: __filemap_get_folio+0x3a9/0x590 pagecache_get_page+0x18/0x60 __get_meta_page+0x95/0x460 [f2fs] get_checkpoint_version+0x2a/0x1e0 [f2fs] validate_checkpoint+0x8e/0x2a0 [f2fs] f2fs_get_valid_checkpoint+0xd0/0x620 [f2fs] f2fs_fill_super+0xc01/0x1d40 [f2fs] mount_bdev+0x18a/0x1c0 f2fs_mount+0x15/0x20 [f2fs] legacy_get_tree+0x28/0x50 vfs_get_tree+0x27/0xc0 path_mount+0x480/0xaa0 do_mount+0x7c/0xa0 __x64_sys_mount+0x8b/0xe0 do_syscall_64+0x38/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xae The root cause is cp_pack_total_block_count field in checkpoint was fuzzed to one, as calcuated, two cp pack block locates in the same block address, so then read latter cp pack block, it will block on the page lock due to the lock has already held when reading previous cp pack block, fix it by adding sanity check for cp_pack_total_block_count. Cc: stable(a)vger.kernel.org Signed-off-by: Chao Yu <chao.yu(a)oppo.com> --- fs/f2fs/checkpoint.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c index 871eee35a32f..aba1b8a1ce66 100644 --- a/fs/f2fs/checkpoint.c +++ b/fs/f2fs/checkpoint.c @@ -875,6 +875,7 @@ static struct page *validate_checkpoint(struct f2fs_sb_info *sbi, struct page *cp_page_1 = NULL, *cp_page_2 = NULL; struct f2fs_checkpoint *cp_block = NULL; unsigned long long cur_version = 0, pre_version = 0; + unsigned int cp_blocks; int err; err = get_checkpoint_version(sbi, cp_addr, &cp_block, @@ -882,15 +883,16 @@ static struct page *validate_checkpoint(struct f2fs_sb_info *sbi, if (err) return NULL; - if (le32_to_cpu(cp_block->cp_pack_total_block_count) > - sbi->blocks_per_seg) { + cp_blocks = le32_to_cpu(cp_block->cp_pack_total_block_count); + + if (cp_blocks > sbi->blocks_per_seg || cp_blocks <= F2FS_CP_PACKS) { f2fs_warn(sbi, "invalid cp_pack_total_block_count:%u", le32_to_cpu(cp_block->cp_pack_total_block_count)); goto invalid_cp; } pre_version = *version; - cp_addr += le32_to_cpu(cp_block->cp_pack_total_block_count) - 1; + cp_addr += cp_blocks - 1; err = get_checkpoint_version(sbi, cp_addr, &cp_block, &cp_page_2, version); if (err) -- 2.25.1

3 years, 9 months

1
0
0 0

[PATCH V2,1/2] mm: madvise: return correct bytes advised with process_madvise

by Charan Teja Kalla

The process_madvise() system call returns error even after processing some VMA's passed in the 'struct iovec' vector list which leaves the user confused to know where to restart the advise next. It is also against this syscall man page[1] documentation where it mentions that "return value may be less than the total number of requested bytes, if an error occurred after some iovec elements were already processed.". Consider a user passed 10 VMA's in the 'struct iovec' vector list of which 9 are processed but one. Then it just returns the error caused on that failed VMA despite the first 9 VMA's processed, leaving the user confused about on which VMA it is failed. Returning the number of bytes processed here can help the user to know which VMA it is failed on and thus can retry/skip the advise on that VMA. [1]https://man7.org/linux/man-pages/man2/process_madvise.2.html. Fixes: ecb8ac8b1f14("mm/madvise: introduce process_madvise() syscall: an external memory hinting API") Cc: <stable(a)vger.kernel.org> # 5.10+ Signed-off-by: Charan Teja Kalla <quic_charante(a)quicinc.com> --- Changes in V2: -- Separated the ENOMEM handling and return bytes processed, as per Minchan comments. -- This contains correcting return bytes processed with process_madvise(). Changes in V1: -- Fixed the ENOMEM handling and return bytes processed by process_madvise. -- https://patchwork.kernel.org/project/linux-mm/patch/1646803679-11433-1-git-… mm/madvise.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/mm/madvise.c b/mm/madvise.c index 38d0f51..e97e6a9 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -1433,8 +1433,7 @@ SYSCALL_DEFINE5(process_madvise, int, pidfd, const struct iovec __user *, vec, iov_iter_advance(&iter, iovec.iov_len); } - if (ret == 0) - ret = total_len - iov_iter_count(&iter); + ret = (total_len - iov_iter_count(&iter)) ? : ret; release_mm: mmput(mm); -- 2.7.4

3 years, 9 months

3
2
0 0

[PATCH 5.4 00/43] 5.4.185-rc1 review

by Greg Kroah-Hartman

This is the start of the stable review cycle for the 5.4.185 release. There are 43 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know. Responses should be made by Wed, 16 Mar 2022 11:27:22 +0000. Anything received after that time might be too late. The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.4.185-rc… or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.4.y and the diffstat can be found below. thanks, greg k-h ------------- Pseudo-Shortlog of commits: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> Linux 5.4.185-rc1 Krish Sadhukhan <krish.sadhukhan(a)oracle.com> KVM: SVM: Don't flush cache if hardware enforces cache coherency across encryption domains Krish Sadhukhan <krish.sadhukhan(a)oracle.com> x86/mm/pat: Don't flush cache if hardware enforces cache coherency across encryption domnains Krish Sadhukhan <krish.sadhukhan(a)oracle.com> x86/cpu: Add hardware-enforced cache coherency as a CPUID feature Borislav Petkov <bp(a)suse.de> x86/cpufeatures: Mark two free bits in word 3 Josh Triplett <josh(a)joshtriplett.org> ext4: add check to prevent attempting to resize an fs with sparse_super2 Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk> ARM: fix Thumb2 regression with Spectre BHB Michael S. Tsirkin <mst(a)redhat.com> virtio: acknowledge all features before access Michael S. Tsirkin <mst(a)redhat.com> virtio: unexport virtio_finalize_features Pali Rohár <pali(a)kernel.org> arm64: dts: marvell: armada-37xx: Remap IO space to bus address 0x0 Emil Renner Berthing <kernel(a)esmil.dk> riscv: Fix auipc+jalr relocation range checks Rong Chen <rong.chen(a)amlogic.com> mmc: meson: Fix usage of meson_mmc_post_req() Robert Hancock <robert.hancock(a)calian.com> net: macb: Fix lost RX packet wakeup race in NAPI receive Dan Carpenter <dan.carpenter(a)oracle.com> staging: gdm724x: fix use after free in gdm_lte_rx() Miklos Szeredi <mszeredi(a)redhat.com> fuse: fix pipe buffer lifetime for direct_io Randy Dunlap <rdunlap(a)infradead.org> ARM: Spectre-BHB: provide empty stub for non-config Mike Kravetz <mike.kravetz(a)oracle.com> selftests/memfd: clean up mapping in mfd_fail_write Aneesh Kumar K.V <aneesh.kumar(a)linux.ibm.com> selftest/vm: fix map_fixed_noreplace test failure Sven Schnelle <svens(a)linux.ibm.com> tracing: Ensure trace buffer is at least 4096 bytes large Niels Dossche <dossche.niels(a)gmail.com> ipv6: prevent a possible race condition with lifetimes Marek Marczykowski-Górecki <marmarek(a)invisiblethingslab.com> Revert "xen-netback: Check for hotplug-status existence before watching" Marek Marczykowski-Górecki <marmarek(a)invisiblethingslab.com> Revert "xen-netback: remove 'hotplug-status' once it has served its purpose" suresh kumar <suresh2514(a)gmail.com> net-sysfs: add check for netdevice being present to speed_show Kumar Kartikeya Dwivedi <memxor(a)gmail.com> selftests/bpf: Add test for bpf_timer overwriting crash Jeremy Linton <jeremy.linton(a)arm.com> net: bcmgenet: Don't claim WOL when its not available Eric Dumazet <edumazet(a)google.com> sctp: fix kernel-infoleak for SCTP sockets Clément Léger <clement.leger(a)bootlin.com> net: phy: DP83822: clear MISR2 register to disable interrupts Miaoqian Lin <linmq006(a)gmail.com> gianfar: ethtool: Fix refcount leak in gfar_get_ts_info Mark Featherston <mark(a)embeddedTS.com> gpio: ts4900: Do not set DAT and OE together Guillaume Nault <gnault(a)redhat.com> selftests: pmtu.sh: Kill tcpdump processes launched by subshell. Pavel Skripkin <paskripkin(a)gmail.com> NFC: port100: fix use-after-free in port100_send_complete Moshe Shemesh <moshe(a)nvidia.com> net/mlx5: Fix a race on command flush flow Mohammad Kabat <mohammadkab(a)nvidia.com> net/mlx5: Fix size field in bufferx_reg struct Duoming Zhou <duoming(a)zju.edu.cn> ax25: Fix NULL pointer dereference in ax25_kill_by_device Jiasheng Jiang <jiasheng(a)iscas.ac.cn> net: ethernet: lpc_eth: Handle error for clk_enable Jiasheng Jiang <jiasheng(a)iscas.ac.cn> net: ethernet: ti: cpts: Handle error for clk_enable Miaoqian Lin <linmq006(a)gmail.com> ethernet: Fix error handling in xemaclite_of_probe Joel Stanley <joel(a)jms.id.au> ARM: dts: aspeed: Fix AST2600 quad spi group Jernej Skrabec <jernej.skrabec(a)gmail.com> drm/sun4i: mixer: Fix P010 and P210 format numbers Tom Rix <trix(a)redhat.com> qed: return status of qed_iov_get_link Jia-Ju Bai <baijiaju1990(a)gmail.com> net: qlogic: check the return value of dma_alloc_coherent() in qed_vf_hw_prepare() Xie Yongji <xieyongji(a)bytedance.com> virtio-blk: Don't use MAX_DISCARD_SEGMENTS if max_discard_seg is zero Pali Rohár <pali(a)kernel.org> arm64: dts: armada-3720-turris-mox: Add missing ethernet0 alias Taniya Das <tdas(a)codeaurora.org> clk: qcom: gdsc: Add support to update GDSC transition delay ------------- Diffstat: Makefile | 4 +- arch/arm/boot/dts/aspeed-g6-pinctrl.dtsi | 2 +- arch/arm/include/asm/spectre.h | 6 +++ arch/arm/kernel/entry-armv.S | 4 +- .../boot/dts/marvell/armada-3720-turris-mox.dts | 8 +++- arch/arm64/boot/dts/marvell/armada-37xx.dtsi | 2 +- arch/riscv/kernel/module.c | 21 +++++++-- arch/x86/include/asm/cpufeatures.h | 2 + arch/x86/kernel/cpu/scattered.c | 1 + arch/x86/kvm/svm.c | 3 +- arch/x86/mm/pageattr.c | 2 +- drivers/block/virtio_blk.c | 10 +++- drivers/clk/qcom/gdsc.c | 26 +++++++++-- drivers/clk/qcom/gdsc.h | 8 +++- drivers/gpio/gpio-ts4900.c | 24 ++++++++-- drivers/gpu/drm/sun4i/sun8i_mixer.h | 8 ++-- drivers/mmc/host/meson-gx-mmc.c | 15 +++--- drivers/net/ethernet/broadcom/genet/bcmgenet_wol.c | 7 +++ drivers/net/ethernet/cadence/macb_main.c | 25 +++++++++- drivers/net/ethernet/freescale/gianfar_ethtool.c | 1 + drivers/net/ethernet/mellanox/mlx5/core/cmd.c | 15 +++--- drivers/net/ethernet/nxp/lpc_eth.c | 5 +- drivers/net/ethernet/qlogic/qed/qed_sriov.c | 18 +++++--- drivers/net/ethernet/qlogic/qed/qed_vf.c | 7 +++ drivers/net/ethernet/ti/cpts.c | 4 +- drivers/net/ethernet/xilinx/xilinx_emaclite.c | 4 +- drivers/net/phy/dp83822.c | 2 +- drivers/net/xen-netback/xenbus.c | 13 ++---- drivers/nfc/port100.c | 2 + drivers/staging/gdm724x/gdm_lte.c | 5 +- drivers/virtio/virtio.c | 39 ++++++++-------- fs/ext4/resize.c | 5 ++ fs/fuse/dev.c | 12 ++++- fs/fuse/file.c | 1 + fs/fuse/fuse_i.h | 1 + include/linux/mlx5/mlx5_ifc.h | 4 +- include/linux/virtio.h | 1 - include/linux/virtio_config.h | 3 +- kernel/trace/trace.c | 10 ++-- net/ax25/af_ax25.c | 7 +++ net/core/net-sysfs.c | 2 +- net/ipv6/addrconf.c | 2 + net/sctp/diag.c | 9 ++-- .../testing/selftests/bpf/prog_tests/timer_crash.c | 32 +++++++++++++ tools/testing/selftests/bpf/progs/timer_crash.c | 54 ++++++++++++++++++++++ tools/testing/selftests/memfd/memfd_test.c | 1 + tools/testing/selftests/net/pmtu.sh | 7 ++- tools/testing/selftests/vm/map_fixed_noreplace.c | 49 +++++++++++++++----- 48 files changed, 378 insertions(+), 115 deletions(-)

3 years, 9 months

9
51
0 0

Re: [PATCH 5.10 37/71] selftests/bpf: Add test for bpf_timer overwriting crash

by gregkh＠linuxfoundation.org

On Fri, Mar 18, 2022 at 02:42:49PM +0000, Geliang Tang wrote: > Hi Greg, > > I got this bpf selftests build break today on the stable branch 5.10.106: > > ========================================================================= > CLNG-LLC [test_maps] test_tracepoint.o > progs/timer_crash.c:8:19: error: field has incomplete type 'struct bpf_timer' > struct bpf_timer timer; > ^ > progs/timer_crash.c:8:9: note: forward declaration of 'struct bpf_timer' > struct bpf_timer timer; > ^ > progs/timer_crash.c:35:6: warning: implicit declaration of function 'bpf_get_current_task_btf' is invalid in C99 [-Wimplicit-function-declaration] > if (bpf_get_current_task_btf()->tgid != pid) > ^ > progs/timer_crash.c:35:34: error: member reference type 'int' is not a pointer > if (bpf_get_current_task_btf()->tgid != pid) > ~~~~~~~~~~~~~~~~~~~~~~~~~~ ^ > progs/timer_crash.c:49:3: warning: implicit declaration of function 'bpf_timer_cancel' is invalid in C99 [-Wimplicit-function-declaration] > bpf_timer_cancel(&e->timer); > ^ > 2 warnings and 2 errors generated. > CLNG-LLC [test_maps] test_trace_ext_tracing.o > llc: error: llc: <stdin>:1:1: error: expected top-level entity > BPF obj compilation failed > ^ > make: *** [Makefile:402: tools/testing/selftests/bpf/timer_crash.o] Error 1 > make: *** Waiting for unfinished jobs.... > CLNG-LLC [test_maps] test_trace_ext.o > ========================================================================= > > It is introduced by this commit, "selftests/bpf: Add test for bpf_timer > overwriting crash". Since the commit "bpf: Introduce bpf timers." has not > been merged into the stable branch yet. > > I am writing to you to report this bug. > Now reverted, thanks! greg k-h

3 years, 9 months

1
0
0 0

FAILED: patch "[PATCH] esp: Fix possible buffer overflow in ESP transformation" failed to apply to 5.10-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 5.10-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From ebe48d368e97d007bfeb76fcb065d6cfc4c96645 Mon Sep 17 00:00:00 2001 From: Steffen Klassert <steffen.klassert(a)secunet.com> Date: Mon, 7 Mar 2022 13:11:39 +0100 Subject: [PATCH] esp: Fix possible buffer overflow in ESP transformation The maximum message size that can be send is bigger than the maximum site that skb_page_frag_refill can allocate. So it is possible to write beyond the allocated buffer. Fix this by doing a fallback to COW in that case. v2: Avoid get get_order() costs as suggested by Linus Torvalds. Fixes: cac2661c53f3 ("esp4: Avoid skb_cow_data whenever possible") Fixes: 03e2a30f6a27 ("esp6: Avoid skb_cow_data whenever possible") Reported-by: valis <sec(a)valis.email> Signed-off-by: Steffen Klassert <steffen.klassert(a)secunet.com> diff --git a/include/net/esp.h b/include/net/esp.h index 9c5637d41d95..90cd02ff77ef 100644 --- a/include/net/esp.h +++ b/include/net/esp.h @@ -4,6 +4,8 @@ #include <linux/skbuff.h> +#define ESP_SKB_FRAG_MAXSIZE (PAGE_SIZE << SKB_FRAG_PAGE_ORDER) + struct ip_esp_hdr; static inline struct ip_esp_hdr *ip_esp_hdr(const struct sk_buff *skb) diff --git a/net/ipv4/esp4.c b/net/ipv4/esp4.c index e1b1d080e908..70e6c87fbe3d 100644 --- a/net/ipv4/esp4.c +++ b/net/ipv4/esp4.c @@ -446,6 +446,7 @@ int esp_output_head(struct xfrm_state *x, struct sk_buff *skb, struct esp_info * struct page *page; struct sk_buff *trailer; int tailen = esp->tailen; + unsigned int allocsz; /* this is non-NULL only with TCP/UDP Encapsulation */ if (x->encap) { @@ -455,6 +456,10 @@ int esp_output_head(struct xfrm_state *x, struct sk_buff *skb, struct esp_info * return err; } + allocsz = ALIGN(skb->data_len + tailen, L1_CACHE_BYTES); + if (allocsz > ESP_SKB_FRAG_MAXSIZE) + goto cow; + if (!skb_cloned(skb)) { if (tailen <= skb_tailroom(skb)) { nfrags = 1; diff --git a/net/ipv6/esp6.c b/net/ipv6/esp6.c index 7591160edce1..b0ffbcd5432d 100644 --- a/net/ipv6/esp6.c +++ b/net/ipv6/esp6.c @@ -482,6 +482,7 @@ int esp6_output_head(struct xfrm_state *x, struct sk_buff *skb, struct esp_info struct page *page; struct sk_buff *trailer; int tailen = esp->tailen; + unsigned int allocsz; if (x->encap) { int err = esp6_output_encap(x, skb, esp); @@ -490,6 +491,10 @@ int esp6_output_head(struct xfrm_state *x, struct sk_buff *skb, struct esp_info return err; } + allocsz = ALIGN(skb->data_len + tailen, L1_CACHE_BYTES); + if (allocsz > ESP_SKB_FRAG_MAXSIZE) + goto cow; + if (!skb_cloned(skb)) { if (tailen <= skb_tailroom(skb)) { nfrags = 1;

3 years, 9 months

3
4
0 0

smsc95xx: Commits for 5.10 stable inclusion

by Fabio Estevam

Hi, I would like to request the following patches to be included into the stable 5.10 tree: a049a30fc27c ("net: usb: Correct PHY handling of smsc95xx") 0bf3885324a8 ("net: usb: Correct reset handling of smsc95xx") c70c453abcbf ("smsc95xx: Ignore -ENODEV errors when device is unplugged") They are already present in 5.15 and 5.16 and they fix real issues on 5.10 too. I have been running 5.10 with these 3 patches applied locally and no reboot/disconnect errors are seen anymore. Alexander Stein also sees an smsc95xx suspend/resume issue fixed in 5.10 with the series applied. Thanks, Fabio Estevam

3 years, 9 months

2
1
0 0

[PATCH stable-5.15.y, stable-5.16.y] btrfs: skip reserved bytes warning on unmount after log cleanup failure

by Anand Jain

From: Filipe Manana <fdmanana(a)suse.com> Commit 40cdc509877bacb438213b83c7541c5e24a1d9ec upstream After the recent changes made by commit c2e39305299f01 ("btrfs: clear extent buffer uptodate when we fail to write it") and its followup fix, commit 651740a5024117 ("btrfs: check WRITE_ERR when trying to read an extent buffer"), we can now end up not cleaning up space reservations of log tree extent buffers after a transaction abort happens, as well as not cleaning up still dirty extent buffers. This happens because if writeback for a log tree extent buffer failed, then we have cleared the bit EXTENT_BUFFER_UPTODATE from the extent buffer and we have also set the bit EXTENT_BUFFER_WRITE_ERR on it. Later on, when trying to free the log tree with free_log_tree(), which iterates over the tree, we can end up getting an -EIO error when trying to read a node or a leaf, since read_extent_buffer_pages() returns -EIO if an extent buffer does not have EXTENT_BUFFER_UPTODATE set and has the EXTENT_BUFFER_WRITE_ERR bit set. Getting that -EIO means that we return immediately as we can not iterate over the entire tree. In that case we never update the reserved space for an extent buffer in the respective block group and space_info object. When this happens we get the following traces when unmounting the fs: [174957.284509] BTRFS: error (device dm-0) in cleanup_transaction:1913: errno=-5 IO failure [174957.286497] BTRFS: error (device dm-0) in free_log_tree:3420: errno=-5 IO failure [174957.399379] ------------[ cut here ]------------ [174957.402497] WARNING: CPU: 2 PID: 3206883 at fs/btrfs/block-group.c:127 btrfs_put_block_group+0x77/0xb0 [btrfs] [174957.407523] Modules linked in: btrfs overlay dm_zero (...) [174957.424917] CPU: 2 PID: 3206883 Comm: umount Tainted: G W 5.16.0-rc5-btrfs-next-109 #1 [174957.426689] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 [174957.428716] RIP: 0010:btrfs_put_block_group+0x77/0xb0 [btrfs] [174957.429717] Code: 21 48 8b bd (...) [174957.432867] RSP: 0018:ffffb70d41cffdd0 EFLAGS: 00010206 [174957.433632] RAX: 0000000000000001 RBX: ffff8b09c3848000 RCX: ffff8b0758edd1c8 [174957.434689] RDX: 0000000000000001 RSI: ffffffffc0b467e7 RDI: ffff8b0758edd000 [174957.436068] RBP: ffff8b0758edd000 R08: 0000000000000000 R09: 0000000000000000 [174957.437114] R10: 0000000000000246 R11: 0000000000000000 R12: ffff8b09c3848148 [174957.438140] R13: ffff8b09c3848198 R14: ffff8b0758edd188 R15: dead000000000100 [174957.439317] FS: 00007f328fb82800(0000) GS:ffff8b0a2d200000(0000) knlGS:0000000000000000 [174957.440402] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [174957.441164] CR2: 00007fff13563e98 CR3: 0000000404f4e005 CR4: 0000000000370ee0 [174957.442117] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [174957.443076] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [174957.443948] Call Trace: [174957.444264] <TASK> [174957.444538] btrfs_free_block_groups+0x255/0x3c0 [btrfs] [174957.445238] close_ctree+0x301/0x357 [btrfs] [174957.445803] ? call_rcu+0x16c/0x290 [174957.446250] generic_shutdown_super+0x74/0x120 [174957.446832] kill_anon_super+0x14/0x30 [174957.447305] btrfs_kill_super+0x12/0x20 [btrfs] [174957.447890] deactivate_locked_super+0x31/0xa0 [174957.448440] cleanup_mnt+0x147/0x1c0 [174957.448888] task_work_run+0x5c/0xa0 [174957.449336] exit_to_user_mode_prepare+0x1e5/0x1f0 [174957.449934] syscall_exit_to_user_mode+0x16/0x40 [174957.450512] do_syscall_64+0x48/0xc0 [174957.450980] entry_SYSCALL_64_after_hwframe+0x44/0xae [174957.451605] RIP: 0033:0x7f328fdc4a97 [174957.452059] Code: 03 0c 00 f7 (...) [174957.454320] RSP: 002b:00007fff13564ec8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6 [174957.455262] RAX: 0000000000000000 RBX: 00007f328feea264 RCX: 00007f328fdc4a97 [174957.456131] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000560b8ae51dd0 [174957.457118] RBP: 0000560b8ae51ba0 R08: 0000000000000000 R09: 00007fff13563c40 [174957.458005] R10: 00007f328fe49fc0 R11: 0000000000000246 R12: 0000000000000000 [174957.459113] R13: 0000560b8ae51dd0 R14: 0000560b8ae51cb0 R15: 0000000000000000 [174957.460193] </TASK> [174957.460534] irq event stamp: 0 [174957.461003] hardirqs last enabled at (0): [<0000000000000000>] 0x0 [174957.461947] hardirqs last disabled at (0): [<ffffffffb0e94214>] copy_process+0x934/0x2040 [174957.463147] softirqs last enabled at (0): [<ffffffffb0e94214>] copy_process+0x934/0x2040 [174957.465116] softirqs last disabled at (0): [<0000000000000000>] 0x0 [174957.466323] ---[ end trace bc7ee0c490bce3af ]--- [174957.467282] ------------[ cut here ]------------ [174957.468184] WARNING: CPU: 2 PID: 3206883 at fs/btrfs/block-group.c:3976 btrfs_free_block_groups+0x330/0x3c0 [btrfs] [174957.470066] Modules linked in: btrfs overlay dm_zero (...) [174957.483137] CPU: 2 PID: 3206883 Comm: umount Tainted: G W 5.16.0-rc5-btrfs-next-109 #1 [174957.484691] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 [174957.486853] RIP: 0010:btrfs_free_block_groups+0x330/0x3c0 [btrfs] [174957.488050] Code: 00 00 00 ad de (...) [174957.491479] RSP: 0018:ffffb70d41cffde0 EFLAGS: 00010206 [174957.492520] RAX: ffff8b08d79310b0 RBX: ffff8b09c3848000 RCX: 0000000000000000 [174957.493868] RDX: 0000000000000001 RSI: fffff443055ee600 RDI: ffffffffb1131846 [174957.495183] RBP: ffff8b08d79310b0 R08: 0000000000000000 R09: 0000000000000000 [174957.496580] R10: 0000000000000001 R11: 0000000000000000 R12: ffff8b08d7931000 [174957.498027] R13: ffff8b09c38492b0 R14: dead000000000122 R15: dead000000000100 [174957.499438] FS: 00007f328fb82800(0000) GS:ffff8b0a2d200000(0000) knlGS:0000000000000000 [174957.500990] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [174957.502117] CR2: 00007fff13563e98 CR3: 0000000404f4e005 CR4: 0000000000370ee0 [174957.503513] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [174957.504864] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [174957.506167] Call Trace: [174957.506654] <TASK> [174957.507047] close_ctree+0x301/0x357 [btrfs] [174957.507867] ? call_rcu+0x16c/0x290 [174957.508567] generic_shutdown_super+0x74/0x120 [174957.509447] kill_anon_super+0x14/0x30 [174957.510194] btrfs_kill_super+0x12/0x20 [btrfs] [174957.511123] deactivate_locked_super+0x31/0xa0 [174957.511976] cleanup_mnt+0x147/0x1c0 [174957.512610] task_work_run+0x5c/0xa0 [174957.513309] exit_to_user_mode_prepare+0x1e5/0x1f0 [174957.514231] syscall_exit_to_user_mode+0x16/0x40 [174957.515069] do_syscall_64+0x48/0xc0 [174957.515718] entry_SYSCALL_64_after_hwframe+0x44/0xae [174957.516688] RIP: 0033:0x7f328fdc4a97 [174957.517413] Code: 03 0c 00 f7 d8 (...) [174957.521052] RSP: 002b:00007fff13564ec8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6 [174957.522514] RAX: 0000000000000000 RBX: 00007f328feea264 RCX: 00007f328fdc4a97 [174957.523950] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000560b8ae51dd0 [174957.525375] RBP: 0000560b8ae51ba0 R08: 0000000000000000 R09: 00007fff13563c40 [174957.526763] R10: 00007f328fe49fc0 R11: 0000000000000246 R12: 0000000000000000 [174957.528058] R13: 0000560b8ae51dd0 R14: 0000560b8ae51cb0 R15: 0000000000000000 [174957.529404] </TASK> [174957.529843] irq event stamp: 0 [174957.530256] hardirqs last enabled at (0): [<0000000000000000>] 0x0 [174957.531061] hardirqs last disabled at (0): [<ffffffffb0e94214>] copy_process+0x934/0x2040 [174957.532075] softirqs last enabled at (0): [<ffffffffb0e94214>] copy_process+0x934/0x2040 [174957.533083] softirqs last disabled at (0): [<0000000000000000>] 0x0 [174957.533865] ---[ end trace bc7ee0c490bce3b0 ]--- [174957.534452] BTRFS info (device dm-0): space_info 4 has 1070841856 free, is not full [174957.535404] BTRFS info (device dm-0): space_info total=1073741824, used=2785280, pinned=0, reserved=49152, may_use=0, readonly=65536 zone_unusable=0 [174957.537029] BTRFS info (device dm-0): global_block_rsv: size 0 reserved 0 [174957.537859] BTRFS info (device dm-0): trans_block_rsv: size 0 reserved 0 [174957.538697] BTRFS info (device dm-0): chunk_block_rsv: size 0 reserved 0 [174957.539552] BTRFS info (device dm-0): delayed_block_rsv: size 0 reserved 0 [174957.540403] BTRFS info (device dm-0): delayed_refs_rsv: size 0 reserved 0 This also means that in case we have log tree extent buffers that are still dirty, we can end up not cleaning them up in case we find an extent buffer with EXTENT_BUFFER_WRITE_ERR set on it, as in that case we have no way for iterating over the rest of the tree. This issue is very often triggered with test cases generic/475 and generic/648 from fstests. The issue could almost be fixed by iterating over the io tree attached to each log root which keeps tracks of the range of allocated extent buffers, log_root->dirty_log_pages, however that does not work and has some inconveniences: 1) After we sync the log, we clear the range of the extent buffers from the io tree, so we can't find them after writeback. We could keep the ranges in the io tree, with a separate bit to signal they represent extent buffers already written, but that means we need to hold into more memory until the transaction commits. How much more memory is used depends a lot on whether we are able to allocate contiguous extent buffers on disk (and how often) for a log tree - if we are able to, then a single extent state record can represent multiple extent buffers, otherwise we need multiple extent state record structures to track each extent buffer. In fact, my earlier approach did that: https://lore.kernel.org/linux-btrfs/3aae7c6728257c7ce2279d6660ee2797e5e34bb… However that can cause a very significant negative impact on performance, not only due to the extra memory usage but also because we get a larger and deeper dirty_log_pages io tree. We got a report that, on beefy machines at least, we can get such performance drop with fsmark for example: https://lore.kernel.org/linux-btrfs/20220117082426.GE32491@xsang-OptiPlex-9… 2) We would be doing it only to deal with an unexpected and exceptional case, which is basically failure to read an extent buffer from disk due to IO failures. On a healthy system we don't expect transaction aborts to happen after all; 3) Instead of relying on iterating the log tree or tracking the ranges of extent buffers in the dirty_log_pages io tree, using the radix tree that tracks extent buffers (fs_info->buffer_radix) to find all log tree extent buffers is not reliable either, because after writeback of an extent buffer it can be evicted from memory by the release page callback of the btree inode (btree_releasepage()). Since there's no way to be able to properly cleanup a log tree without being able to read its extent buffers from disk and without using more memory to track the logical ranges of the allocated extent buffers do the following: 1) When we fail to cleanup a log tree, setup a flag that indicates that failure; 2) Trigger writeback of all log tree extent buffers that are still dirty, and wait for the writeback to complete. This is just to cleanup their state, page states, page leaks, etc; 3) When unmounting the fs, ignore if the number of bytes reserved in a block group and in a space_info is not 0 if, and only if, we failed to cleanup a log tree. Also ignore only for metadata block groups and the metadata space_info object. This is far from a perfect solution, but it serves to silence test failures such as those from generic/475 and generic/648. However having a non-zero value for the reserved bytes counters on unmount after a transaction abort, is not such a terrible thing and it's completely harmless, it does not affect the filesystem integrity in any way. Signed-off-by: Filipe Manana <fdmanana(a)suse.com> Signed-off-by: David Sterba <dsterba(a)suse.com> Signed-off-by: Anand Jain <anand.jain(a)oracle.com> --- Unrelated conflict fix in fs/btrfs/ctree.h fs/btrfs/block-group.c | 26 ++++++++++++++++++++++++-- fs/btrfs/ctree.h | 7 +++++++ fs/btrfs/tree-log.c | 23 +++++++++++++++++++++++ 3 files changed, 54 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index 5edd07e0232d..e1c5c2114edf 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -123,7 +123,16 @@ void btrfs_put_block_group(struct btrfs_block_group *cache) { if (refcount_dec_and_test(&cache->refs)) { WARN_ON(cache->pinned > 0); - WARN_ON(cache->reserved > 0); + /* + * If there was a failure to cleanup a log tree, very likely due + * to an IO failure on a writeback attempt of one or more of its + * extent buffers, we could not do proper (and cheap) unaccounting + * of their reserved space, so don't warn on reserved > 0 in that + * case. + */ + if (!(cache->flags & BTRFS_BLOCK_GROUP_METADATA) || + !BTRFS_FS_LOG_CLEANUP_ERROR(cache->fs_info)) + WARN_ON(cache->reserved > 0); /* * A block_group shouldn't be on the discard_list anymore. @@ -3888,9 +3897,22 @@ int btrfs_free_block_groups(struct btrfs_fs_info *info) * important and indicates a real bug if this happens. */ if (WARN_ON(space_info->bytes_pinned > 0 || - space_info->bytes_reserved > 0 || space_info->bytes_may_use > 0)) btrfs_dump_space_info(info, space_info, 0, 0); + + /* + * If there was a failure to cleanup a log tree, very likely due + * to an IO failure on a writeback attempt of one or more of its + * extent buffers, we could not do proper (and cheap) unaccounting + * of their reserved space, so don't warn on bytes_reserved > 0 in + * that case. + */ + if (!(space_info->flags & BTRFS_BLOCK_GROUP_METADATA) || + !BTRFS_FS_LOG_CLEANUP_ERROR(info)) { + if (WARN_ON(space_info->bytes_reserved > 0)) + btrfs_dump_space_info(info, space_info, 0, 0); + } + WARN_ON(space_info->reclaim_size > 0); list_del(&space_info->list); btrfs_sysfs_remove_space_info(space_info); diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index e89f814cc8f5..21c44846b002 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -142,6 +142,9 @@ enum { BTRFS_FS_STATE_DEV_REPLACING, /* The btrfs_fs_info created for self-tests */ BTRFS_FS_STATE_DUMMY_FS_INFO, + + /* Indicates there was an error cleaning up a log tree. */ + BTRFS_FS_STATE_LOG_CLEANUP_ERROR, }; #define BTRFS_BACKREF_REV_MAX 256 @@ -3578,6 +3581,10 @@ do { \ (errno), fmt, ##args); \ } while (0) +#define BTRFS_FS_LOG_CLEANUP_ERROR(fs_info) \ + (unlikely(test_bit(BTRFS_FS_STATE_LOG_CLEANUP_ERROR, \ + &(fs_info)->fs_state))) + __printf(5, 6) __cold void __btrfs_panic(struct btrfs_fs_info *fs_info, const char *function, diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c index 8ef65073ce8c..e90d80a8a9e3 100644 --- a/fs/btrfs/tree-log.c +++ b/fs/btrfs/tree-log.c @@ -3423,6 +3423,29 @@ static void free_log_tree(struct btrfs_trans_handle *trans, if (log->node) { ret = walk_log_tree(trans, log, &wc); if (ret) { + /* + * We weren't able to traverse the entire log tree, the + * typical scenario is getting an -EIO when reading an + * extent buffer of the tree, due to a previous writeback + * failure of it. + */ + set_bit(BTRFS_FS_STATE_LOG_CLEANUP_ERROR, + &log->fs_info->fs_state); + + /* + * Some extent buffers of the log tree may still be dirty + * and not yet written back to storage, because we may + * have updates to a log tree without syncing a log tree, + * such as during rename and link operations. So flush + * them out and wait for their writeback to complete, so + * that we properly cleanup their state and pages. + */ + btrfs_write_marked_extents(log->fs_info, + &log->dirty_log_pages, + EXTENT_DIRTY | EXTENT_NEW); + btrfs_wait_tree_log_extents(log, + EXTENT_DIRTY | EXTENT_NEW); + if (trans) btrfs_abort_transaction(trans, ret); else -- 2.33.1

3 years, 9 months

2
1
0 0

[PATCH v2 2/2] PCI: xgene: Revert "PCI: xgene: Fix IB window setup"

by Marc Zyngier

Commit c7a75d07827a ("PCI: xgene: Fix IB window setup") tried to fix the damages that 6dce5aa59e0b ("PCI: xgene: Use inbound resources for setup") caused, but actually didn't improve anything for some plarforms (at least Mustang and m400 are still broken). Given that 6dce5aa59e0b has been reverted, revert this patch as well, restoring the PCIe support on XGene to its pre-5.5, working state. Cc: Rob Herring <robh(a)kernel.org> Cc: Toan Le <toan(a)os.amperecomputing.com> Cc: Lorenzo Pieralisi <lorenzo.pieralisi(a)arm.com> Cc: Krzysztof Wilczyński <kw(a)linux.com> Cc: Bjorn Helgaas <bhelgaas(a)google.com> Cc: Stéphane Graber <stgraber(a)ubuntu.com> Cc: dann frazier <dann.frazier(a)canonical.com> Cc: stable(a)vger.kernel.org Signed-off-by: Marc Zyngier <maz(a)kernel.org> Fixes: c7a75d07827a ("PCI: xgene: Fix IB window setup") Link: https://lore.kernel.org/r/YjN8pT5e6/8cRohQ@xps13.dannf --- drivers/pci/controller/pci-xgene.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/pci/controller/pci-xgene.c b/drivers/pci/controller/pci-xgene.c index aa41ceaf031f..7c763d820c52 100644 --- a/drivers/pci/controller/pci-xgene.c +++ b/drivers/pci/controller/pci-xgene.c @@ -465,7 +465,7 @@ static int xgene_pcie_select_ib_reg(u8 *ib_reg_mask, u64 size) return 1; } - if ((size > SZ_1K) && (size < SZ_4G) && !(*ib_reg_mask & (1 << 0))) { + if ((size > SZ_1K) && (size < SZ_1T) && !(*ib_reg_mask & (1 << 0))) { *ib_reg_mask |= (1 << 0); return 0; } -- 2.34.1

3 years, 9 months

1
0
0 0

[PATCH v2 1/2] PCI: xgene: Revert "PCI: xgene: Use inbound resources for setup"

by Marc Zyngier

Commit 6dce5aa59e0b ("PCI: xgene: Use inbound resources for setup") killed PCIe on my XGene-1 box (a Mustang board). The machine itself is still alive, but half of its storage (over NVMe) is gone, and the NVMe driver just times out. Note that this machine boots with a device tree provided by the UEFI firmware (2016 vintage), which could well be non conformant with the spec, hence the breakage. With the patch reverted, the box boots 5.17-rc8 with flying colors. Cc: Rob Herring <robh(a)kernel.org> Cc: Toan Le <toan(a)os.amperecomputing.com> Cc: Lorenzo Pieralisi <lorenzo.pieralisi(a)arm.com> Cc: Krzysztof Wilczyński <kw(a)linux.com> Cc: Bjorn Helgaas <bhelgaas(a)google.com> Cc: Stéphane Graber <stgraber(a)ubuntu.com> Cc: dann frazier <dann.frazier(a)canonical.com> Cc: stable(a)vger.kernel.org Signed-off-by: Marc Zyngier <maz(a)kernel.org> Fixes: 6dce5aa59e0b ("PCI: xgene: Use inbound resources for setup") Link: https://lore.kernel.org/all/Yf2wTLjmcRj+AbDv@xps13.dannf --- drivers/pci/controller/pci-xgene.c | 33 ++++++++++++++++++++---------- 1 file changed, 22 insertions(+), 11 deletions(-) diff --git a/drivers/pci/controller/pci-xgene.c b/drivers/pci/controller/pci-xgene.c index 0d5acbfc7143..aa41ceaf031f 100644 --- a/drivers/pci/controller/pci-xgene.c +++ b/drivers/pci/controller/pci-xgene.c @@ -479,28 +479,27 @@ static int xgene_pcie_select_ib_reg(u8 *ib_reg_mask, u64 size) } static void xgene_pcie_setup_ib_reg(struct xgene_pcie *port, - struct resource_entry *entry, - u8 *ib_reg_mask) + struct of_pci_range *range, u8 *ib_reg_mask) { void __iomem *cfg_base = port->cfg_base; struct device *dev = port->dev; void __iomem *bar_addr; u32 pim_reg; - u64 cpu_addr = entry->res->start; - u64 pci_addr = cpu_addr - entry->offset; - u64 size = resource_size(entry->res); + u64 cpu_addr = range->cpu_addr; + u64 pci_addr = range->pci_addr; + u64 size = range->size; u64 mask = ~(size - 1) | EN_REG; u32 flags = PCI_BASE_ADDRESS_MEM_TYPE_64; u32 bar_low; int region; - region = xgene_pcie_select_ib_reg(ib_reg_mask, size); + region = xgene_pcie_select_ib_reg(ib_reg_mask, range->size); if (region < 0) { dev_warn(dev, "invalid pcie dma-range config\n"); return; } - if (entry->res->flags & IORESOURCE_PREFETCH) + if (range->flags & IORESOURCE_PREFETCH) flags |= PCI_BASE_ADDRESS_MEM_PREFETCH; bar_low = pcie_bar_low_val((u32)cpu_addr, flags); @@ -531,13 +530,25 @@ static void xgene_pcie_setup_ib_reg(struct xgene_pcie *port, static int xgene_pcie_parse_map_dma_ranges(struct xgene_pcie *port) { - struct pci_host_bridge *bridge = pci_host_bridge_from_priv(port); - struct resource_entry *entry; + struct device_node *np = port->node; + struct of_pci_range range; + struct of_pci_range_parser parser; + struct device *dev = port->dev; u8 ib_reg_mask = 0; - resource_list_for_each_entry(entry, &bridge->dma_ranges) - xgene_pcie_setup_ib_reg(port, entry, &ib_reg_mask); + if (of_pci_dma_range_parser_init(&parser, np)) { + dev_err(dev, "missing dma-ranges property\n"); + return -EINVAL; + } + + /* Get the dma-ranges from DT */ + for_each_of_pci_range(&parser, &range) { + u64 end = range.cpu_addr + range.size - 1; + dev_dbg(dev, "0x%08x 0x%016llx..0x%016llx -> 0x%016llx\n", + range.flags, range.cpu_addr, end, range.pci_addr); + xgene_pcie_setup_ib_reg(port, &range, &ib_reg_mask); + } return 0; } -- 2.34.1

3 years, 9 months

1
0
0 0

The linux-5.17.y tag looks bogus.

by Sebastian Andrzej Siewior

Hi, I just noticed that the stable repository has the linux-5.17.y tag and no branch with the linux-5.17.y name. That tag looks like a copy of Linus' v5.17. I guess this is a mistake. On my side git refused to push the linux-5.17.y branch because it already had a tag with the same name. Could you please remove it? Sebastian

3 years, 9 months

2
3
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-stable-mirror March 2022