October 2020 - Linux-stable-mirror

patch "serial: pl011: Fix lockdep splat when handling magic-sysrq interrupt" added to tty-next

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled serial: pl011: Fix lockdep splat when handling magic-sysrq interrupt to my tty git tree which can be found at git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty.git in the tty-next branch. The patch will show up in the next release of the linux-next tree (usually sometime within the next 24 hours during the week.) The patch will also be merged in the next major kernel release during the merge window. If you have any questions about this process, please let me know. >From 534cf755d9df99e214ddbe26b91cd4d81d2603e2 Mon Sep 17 00:00:00 2001 From: Peter Zijlstra <peterz(a)infradead.org> Date: Wed, 30 Sep 2020 13:04:32 +0100 Subject: serial: pl011: Fix lockdep splat when handling magic-sysrq interrupt Issuing a magic-sysrq via the PL011 causes the following lockdep splat, which is easily reproducible under QEMU: | sysrq: Changing Loglevel | sysrq: Loglevel set to 9 | | ====================================================== | WARNING: possible circular locking dependency detected | 5.9.0-rc7 #1 Not tainted | ------------------------------------------------------ | systemd-journal/138 is trying to acquire lock: | ffffab133ad950c0 (console_owner){-.-.}-{0:0}, at: console_lock_spinning_enable+0x34/0x70 | | but task is already holding lock: | ffff0001fd47b098 (&port_lock_key){-.-.}-{2:2}, at: pl011_int+0x40/0x488 | | which lock already depends on the new lock. [...] | Possible unsafe locking scenario: | | CPU0 CPU1 | ---- ---- | lock(&port_lock_key); | lock(console_owner); | lock(&port_lock_key); | lock(console_owner); | | *** DEADLOCK *** The issue being that CPU0 takes 'port_lock' on the irq path in pl011_int() before taking 'console_owner' on the printk() path, whereas CPU1 takes the two locks in the opposite order on the printk() path due to setting the "console_owner" prior to calling into into the actual console driver. Fix this in the same way as the msm-serial driver by dropping 'port_lock' before handling the sysrq. Cc: <stable(a)vger.kernel.org> # 4.19+ Cc: Russell King <linux(a)armlinux.org.uk> Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> Cc: Jiri Slaby <jirislaby(a)kernel.org> Link: https://lore.kernel.org/r/20200811101313.GA6970@willie-the-truck Signed-off-by: Peter Zijlstra <peterz(a)infradead.org> Tested-by: Will Deacon <will(a)kernel.org> Signed-off-by: Will Deacon <will(a)kernel.org> Link: https://lore.kernel.org/r/20200930120432.16551-1-will@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- drivers/tty/serial/amba-pl011.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/drivers/tty/serial/amba-pl011.c b/drivers/tty/serial/amba-pl011.c index 67498594d7d7..87dc3fc15694 100644 --- a/drivers/tty/serial/amba-pl011.c +++ b/drivers/tty/serial/amba-pl011.c @@ -308,8 +308,9 @@ static void pl011_write(unsigned int val, const struct uart_amba_port *uap, */ static int pl011_fifo_to_tty(struct uart_amba_port *uap) { - u16 status; unsigned int ch, flag, fifotaken; + int sysrq; + u16 status; for (fifotaken = 0; fifotaken != 256; fifotaken++) { status = pl011_read(uap, REG_FR); @@ -344,10 +345,12 @@ static int pl011_fifo_to_tty(struct uart_amba_port *uap) flag = TTY_FRAME; } - if (uart_handle_sysrq_char(&uap->port, ch & 255)) - continue; + spin_unlock(&uap->port.lock); + sysrq = uart_handle_sysrq_char(&uap->port, ch & 255); + spin_lock(&uap->port.lock); - uart_insert_char(&uap->port, ch, UART011_DR_OE, ch, flag); + if (!sysrq) + uart_insert_char(&uap->port, ch, UART011_DR_OE, ch, flag); } return fifotaken; -- 2.28.0

4 years, 9 months

1
0
0 0

patch "tty: serial: fsl_lpuart: fix lpuart32_poll_get_char" added to tty-next

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled tty: serial: fsl_lpuart: fix lpuart32_poll_get_char to my tty git tree which can be found at git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty.git in the tty-next branch. The patch will show up in the next release of the linux-next tree (usually sometime within the next 24 hours during the week.) The patch will also be merged in the next major kernel release during the merge window. If you have any questions about this process, please let me know. >From 29788ab1d2bf26c130de8f44f9553ee78a27e8d5 Mon Sep 17 00:00:00 2001 From: Peng Fan <peng.fan(a)nxp.com> Date: Tue, 29 Sep 2020 17:55:09 +0800 Subject: tty: serial: fsl_lpuart: fix lpuart32_poll_get_char The watermark is set to 1, so we need to input two chars to trigger RDRF using the original logic. With the new logic, we could always get the char when there is data in FIFO. Suggested-by: Fugang Duan <fugang.duan(a)nxp.com> Signed-off-by: Peng Fan <peng.fan(a)nxp.com> Link: https://lore.kernel.org/r/20200929095509.21680-1-peng.fan@nxp.com Cc: stable <stable(a)vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- drivers/tty/serial/fsl_lpuart.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/tty/serial/fsl_lpuart.c b/drivers/tty/serial/fsl_lpuart.c index 645bbb24b433..590c901a1fc5 100644 --- a/drivers/tty/serial/fsl_lpuart.c +++ b/drivers/tty/serial/fsl_lpuart.c @@ -680,7 +680,7 @@ static void lpuart32_poll_put_char(struct uart_port *port, unsigned char c) static int lpuart32_poll_get_char(struct uart_port *port) { - if (!(lpuart32_read(port, UARTSTAT) & UARTSTAT_RDRF)) + if (!(lpuart32_read(port, UARTWATER) >> UARTWATER_RXCNT_OFF)) return NO_POLL_CHAR; return lpuart32_read(port, UARTDATA); -- 2.28.0

4 years, 9 months

1
0
0 0

patch "tty: serial: lpuart: fix lpuart32_write usage" added to tty-next

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled tty: serial: lpuart: fix lpuart32_write usage to my tty git tree which can be found at git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty.git in the tty-next branch. The patch will show up in the next release of the linux-next tree (usually sometime within the next 24 hours during the week.) The patch will also be merged in the next major kernel release during the merge window. If you have any questions about this process, please let me know. >From 9ea40db477c024dcb87321b7f32bd6ea12130ac2 Mon Sep 17 00:00:00 2001 From: Peng Fan <peng.fan(a)nxp.com> Date: Tue, 29 Sep 2020 17:19:20 +0800 Subject: tty: serial: lpuart: fix lpuart32_write usage The 2nd and 3rd parameter were wrongly used, and cause kernel abort when doing kgdb debug. Fixes: 1da17d7cf8e2 ("tty: serial: fsl_lpuart: Use appropriate lpuart32_* I/O funcs") Cc: stable <stable(a)vger.kernel.org> Signed-off-by: Peng Fan <peng.fan(a)nxp.com> Link: https://lore.kernel.org/r/20200929091920.22612-1-peng.fan@nxp.com Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- drivers/tty/serial/fsl_lpuart.c | 14 ++++++-------- 1 file changed, 6 insertions(+), 8 deletions(-) diff --git a/drivers/tty/serial/fsl_lpuart.c b/drivers/tty/serial/fsl_lpuart.c index 5a5a22d77841..645bbb24b433 100644 --- a/drivers/tty/serial/fsl_lpuart.c +++ b/drivers/tty/serial/fsl_lpuart.c @@ -649,26 +649,24 @@ static int lpuart32_poll_init(struct uart_port *port) spin_lock_irqsave(&sport->port.lock, flags); /* Disable Rx & Tx */ - lpuart32_write(&sport->port, UARTCTRL, 0); + lpuart32_write(&sport->port, 0, UARTCTRL); temp = lpuart32_read(&sport->port, UARTFIFO); /* Enable Rx and Tx FIFO */ - lpuart32_write(&sport->port, UARTFIFO, - temp | UARTFIFO_RXFE | UARTFIFO_TXFE); + lpuart32_write(&sport->port, temp | UARTFIFO_RXFE | UARTFIFO_TXFE, UARTFIFO); /* flush Tx and Rx FIFO */ - lpuart32_write(&sport->port, UARTFIFO, - UARTFIFO_TXFLUSH | UARTFIFO_RXFLUSH); + lpuart32_write(&sport->port, UARTFIFO_TXFLUSH | UARTFIFO_RXFLUSH, UARTFIFO); /* explicitly clear RDRF */ if (lpuart32_read(&sport->port, UARTSTAT) & UARTSTAT_RDRF) { lpuart32_read(&sport->port, UARTDATA); - lpuart32_write(&sport->port, UARTFIFO, UARTFIFO_RXUF); + lpuart32_write(&sport->port, UARTFIFO_RXUF, UARTFIFO); } /* Enable Rx and Tx */ - lpuart32_write(&sport->port, UARTCTRL, UARTCTRL_RE | UARTCTRL_TE); + lpuart32_write(&sport->port, UARTCTRL_RE | UARTCTRL_TE, UARTCTRL); spin_unlock_irqrestore(&sport->port.lock, flags); return 0; @@ -677,7 +675,7 @@ static int lpuart32_poll_init(struct uart_port *port) static void lpuart32_poll_put_char(struct uart_port *port, unsigned char c) { lpuart32_wait_bit_set(port, UARTSTAT, UARTSTAT_TDRE); - lpuart32_write(port, UARTDATA, c); + lpuart32_write(port, c, UARTDATA); } static int lpuart32_poll_get_char(struct uart_port *port) -- 2.28.0

4 years, 9 months

1
0
0 0

patch "serial: qcom_geni_serial: To correct QUP Version detection logic" added to tty-next

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled serial: qcom_geni_serial: To correct QUP Version detection logic to my tty git tree which can be found at git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty.git in the tty-next branch. The patch will show up in the next release of the linux-next tree (usually sometime within the next 24 hours during the week.) The patch will also be merged in the next major kernel release during the merge window. If you have any questions about this process, please let me know. >From c9ca43d42ed8d5fd635d327a664ed1d8579eb2af Mon Sep 17 00:00:00 2001 From: Paras Sharma <parashar(a)codeaurora.org> Date: Wed, 30 Sep 2020 11:35:26 +0530 Subject: serial: qcom_geni_serial: To correct QUP Version detection logic For QUP IP versions 2.5 and above the oversampling rate is halved from 32 to 16. Commit ce734600545f ("tty: serial: qcom_geni_serial: Update the oversampling rate") is pushed to handle this scenario. But the existing logic is failing to classify QUP Version 3.0 into the correct group ( 2.5 and above). As result Serial Engine clocks are not configured properly for baud rate and garbage data is sampled to FIFOs from the line. So, fix the logic to detect QUP with versions 2.5 and above. Fixes: ce734600545f ("tty: serial: qcom_geni_serial: Update the oversampling rate") Cc: stable <stable(a)vger.kernel.org> Signed-off-by: Paras Sharma <parashar(a)codeaurora.org> Reviewed-by: Akash Asthana <akashast(a)codeaurora.org> Link: https://lore.kernel.org/r/1601445926-23673-1-git-send-email-parashar@codeau… Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- drivers/tty/serial/qcom_geni_serial.c | 2 +- include/linux/qcom-geni-se.h | 3 +++ 2 files changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/tty/serial/qcom_geni_serial.c b/drivers/tty/serial/qcom_geni_serial.c index 8da2eb2d1090..291649f02821 100644 --- a/drivers/tty/serial/qcom_geni_serial.c +++ b/drivers/tty/serial/qcom_geni_serial.c @@ -1000,7 +1000,7 @@ static void qcom_geni_serial_set_termios(struct uart_port *uport, sampling_rate = UART_OVERSAMPLING; /* Sampling rate is halved for IP versions >= 2.5 */ ver = geni_se_get_qup_hw_version(&port->se); - if (GENI_SE_VERSION_MAJOR(ver) >= 2 && GENI_SE_VERSION_MINOR(ver) >= 5) + if (ver >= QUP_SE_VERSION_2_5) sampling_rate /= 2; clk_rate = get_clk_div_rate(baud, sampling_rate, &clk_div); diff --git a/include/linux/qcom-geni-se.h b/include/linux/qcom-geni-se.h index 8f385fbe5a0e..1c31f26ccc7a 100644 --- a/include/linux/qcom-geni-se.h +++ b/include/linux/qcom-geni-se.h @@ -248,6 +248,9 @@ struct geni_se { #define GENI_SE_VERSION_MINOR(ver) ((ver & HW_VER_MINOR_MASK) >> HW_VER_MINOR_SHFT) #define GENI_SE_VERSION_STEP(ver) (ver & HW_VER_STEP_MASK) +/* QUP SE VERSION value for major number 2 and minor number 5 */ +#define QUP_SE_VERSION_2_5 0x20050000 + /* * Define bandwidth thresholds that cause the underlying Core 2X interconnect * clock to run at the named frequency. These baseline values are recommended -- 2.28.0

4 years, 9 months

1
0
0 0

[PATCH v9 4/7] tcp: use sendpage_ok() to detect misused .sendpage

by Coly Li

commit a10674bf2406 ("tcp: detecting the misuse of .sendpage for Slab objects") adds the checks for Slab pages, but the pages don't have page_count are still missing from the check. Network layer's sendpage method is not designed to send page_count 0 pages neither, therefore both PageSlab() and page_count() should be both checked for the sending page. This is exactly what sendpage_ok() does. This patch uses sendpage_ok() in do_tcp_sendpages() to detect misused .sendpage, to make the code more robust. Fixes: a10674bf2406 ("tcp: detecting the misuse of .sendpage for Slab objects") Suggested-by: Eric Dumazet <eric.dumazet(a)gmail.com> Signed-off-by: Coly Li <colyli(a)suse.de> Cc: Vasily Averin <vvs(a)virtuozzo.com> Cc: David S. Miller <davem(a)davemloft.net> Cc: stable(a)vger.kernel.org --- net/ipv4/tcp.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 31f3b858db81..2135ee7c806d 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -970,7 +970,8 @@ ssize_t do_tcp_sendpages(struct sock *sk, struct page *page, int offset, long timeo = sock_sndtimeo(sk, flags & MSG_DONTWAIT); if (IS_ENABLED(CONFIG_DEBUG_VM) && - WARN_ONCE(PageSlab(page), "page must not be a Slab one")) + WARN_ONCE(!sendpage_ok(page), + "page must not be a Slab one and have page_count > 0")) return -EINVAL; /* Wait for a connection to finish. One exception is TCP Fast Open -- 2.26.2

4 years, 9 months

1
0
0 0

[PATCH v9 3/7] nvme-tcp: check page by sendpage_ok() before calling kernel_sendpage()

by Coly Li

Currently nvme_tcp_try_send_data() doesn't use kernel_sendpage() to send slab pages. But for pages allocated by __get_free_pages() without __GFP_COMP, which also have refcount as 0, they are still sent by kernel_sendpage() to remote end, this is problematic. The new introduced helper sendpage_ok() checks both PageSlab tag and page_count counter, and returns true if the checking page is OK to be sent by kernel_sendpage(). This patch fixes the page checking issue of nvme_tcp_try_send_data() with sendpage_ok(). If sendpage_ok() returns true, send this page by kernel_sendpage(), otherwise use sock_no_sendpage to handle this page. Signed-off-by: Coly Li <colyli(a)suse.de> Cc: Chaitanya Kulkarni <chaitanya.kulkarni(a)wdc.com> Cc: Christoph Hellwig <hch(a)lst.de> Cc: Hannes Reinecke <hare(a)suse.de> Cc: Jan Kara <jack(a)suse.com> Cc: Jens Axboe <axboe(a)kernel.dk> Cc: Mikhail Skorzhinskii <mskorzhinskiy(a)solarflare.com> Cc: Philipp Reisner <philipp.reisner(a)linbit.com> Cc: Sagi Grimberg <sagi(a)grimberg.me> Cc: Vlastimil Babka <vbabka(a)suse.com> Cc: stable(a)vger.kernel.org --- drivers/nvme/host/tcp.c | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c index 8f4f29f18b8c..d6a3e1487354 100644 --- a/drivers/nvme/host/tcp.c +++ b/drivers/nvme/host/tcp.c @@ -913,12 +913,11 @@ static int nvme_tcp_try_send_data(struct nvme_tcp_request *req) else flags |= MSG_MORE | MSG_SENDPAGE_NOTLAST; - /* can't zcopy slab pages */ - if (unlikely(PageSlab(page))) { - ret = sock_no_sendpage(queue->sock, page, offset, len, + if (sendpage_ok(page)) { + ret = kernel_sendpage(queue->sock, page, offset, len, flags); } else { - ret = kernel_sendpage(queue->sock, page, offset, len, + ret = sock_no_sendpage(queue->sock, page, offset, len, flags); } if (ret <= 0) -- 2.26.2

4 years, 9 months

1
0
0 0

[PATCH v9 1/7] net: introduce helper sendpage_ok() in include/linux/net.h

by Coly Li

The original problem was from nvme-over-tcp code, who mistakenly uses kernel_sendpage() to send pages allocated by __get_free_pages() without __GFP_COMP flag. Such pages don't have refcount (page_count is 0) on tail pages, sending them by kernel_sendpage() may trigger a kernel panic from a corrupted kernel heap, because these pages are incorrectly freed in network stack as page_count 0 pages. This patch introduces a helper sendpage_ok(), it returns true if the checking page, - is not slab page: PageSlab(page) is false. - has page refcount: page_count(page) is not zero All drivers who want to send page to remote end by kernel_sendpage() may use this helper to check whether the page is OK. If the helper does not return true, the driver should try other non sendpage method (e.g. sock_no_sendpage()) to handle the page. Signed-off-by: Coly Li <colyli(a)suse.de> Cc: Chaitanya Kulkarni <chaitanya.kulkarni(a)wdc.com> Cc: Christoph Hellwig <hch(a)lst.de> Cc: Hannes Reinecke <hare(a)suse.de> Cc: Jan Kara <jack(a)suse.com> Cc: Jens Axboe <axboe(a)kernel.dk> Cc: Mikhail Skorzhinskii <mskorzhinskiy(a)solarflare.com> Cc: Philipp Reisner <philipp.reisner(a)linbit.com> Cc: Sagi Grimberg <sagi(a)grimberg.me> Cc: Vlastimil Babka <vbabka(a)suse.com> Cc: stable(a)vger.kernel.org --- include/linux/net.h | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/include/linux/net.h b/include/linux/net.h index d48ff1180879..ae713c851342 100644 --- a/include/linux/net.h +++ b/include/linux/net.h @@ -21,6 +21,7 @@ #include <linux/rcupdate.h> #include <linux/once.h> #include <linux/fs.h> +#include <linux/mm.h> #include <linux/sockptr.h> #include <uapi/linux/net.h> @@ -286,6 +287,21 @@ do { \ #define net_get_random_once_wait(buf, nbytes) \ get_random_once_wait((buf), (nbytes)) +/* + * E.g. XFS meta- & log-data is in slab pages, or bcache meta + * data pages, or other high order pages allocated by + * __get_free_pages() without __GFP_COMP, which have a page_count + * of 0 and/or have PageSlab() set. We cannot use send_page for + * those, as that does get_page(); put_page(); and would cause + * either a VM_BUG directly, or __page_cache_release a page that + * would actually still be referenced by someone, leading to some + * obscure delayed Oops somewhere else. + */ +static inline bool sendpage_ok(struct page *page) +{ + return !PageSlab(page) && page_count(page) >= 1; +} + int kernel_sendmsg(struct socket *sock, struct msghdr *msg, struct kvec *vec, size_t num, size_t len); int kernel_sendmsg_locked(struct sock *sk, struct msghdr *msg, -- 2.26.2

4 years, 9 months

1
0
0 0

[PATCH 9 12/12] btrfs: statfs: Use pre-calculated per-profile available space

by Qu Wenruo

Although btrfs_calc_avail_data_space() is trying to do an estimation on how many data chunks it can allocate, the estimation is far from perfect: - Metadata over-commit is not considered at all - Chunk allocation doesn't take RAID5/6 into consideration This patch will change btrfs_calc_avail_data_space() to use pre-calculated per-profile available space. This provides the following benefits: - Accurate unallocated data space estimation It's as accurate as chunk allocator, and can handle RAID5/6 and newly introduced RAID1C3/C4. For the metadata over-commit part, we don't take that into consideration yet. As metadata over-commit only happens when we have enough unallocated space, and under most case we won't use that much metadata space at all. And we still have the existing 0-available space check, to prevent us from reporting too optimistic f_bavail result. Since we're keeping the old lock-free design, statfs should not experience any extra delay. CC: stable(a)vger.kernel.org # 5.4+ Signed-off-by: Qu Wenruo <wqu(a)suse.com> --- fs/btrfs/super.c | 131 +++-------------------------------------------- 1 file changed, 7 insertions(+), 124 deletions(-) diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index 25967ecaaf0a..355e4f6a2fd4 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -2016,124 +2016,6 @@ static inline void btrfs_descending_sort_devices( btrfs_cmp_device_free_bytes, NULL); } -/* - * The helper to calc the free space on the devices that can be used to store - * file data. - */ -static inline int btrfs_calc_avail_data_space(struct btrfs_fs_info *fs_info, - u64 *free_bytes) -{ - struct btrfs_device_info *devices_info; - struct btrfs_fs_devices *fs_devices = fs_info->fs_devices; - struct btrfs_device *device; - u64 type; - u64 avail_space; - u64 min_stripe_size; - int num_stripes = 1; - int i = 0, nr_devices; - const struct btrfs_raid_attr *rattr; - - /* - * We aren't under the device list lock, so this is racy-ish, but good - * enough for our purposes. - */ - nr_devices = fs_info->fs_devices->open_devices; - if (!nr_devices) { - smp_mb(); - nr_devices = fs_info->fs_devices->open_devices; - ASSERT(nr_devices); - if (!nr_devices) { - *free_bytes = 0; - return 0; - } - } - - devices_info = kmalloc_array(nr_devices, sizeof(*devices_info), - GFP_KERNEL); - if (!devices_info) - return -ENOMEM; - - /* calc min stripe number for data space allocation */ - type = btrfs_data_alloc_profile(fs_info); - rattr = &btrfs_raid_array[btrfs_bg_flags_to_raid_index(type)]; - - if (type & BTRFS_BLOCK_GROUP_RAID0) - num_stripes = nr_devices; - else if (type & BTRFS_BLOCK_GROUP_RAID1) - num_stripes = 2; - else if (type & BTRFS_BLOCK_GROUP_RAID1C3) - num_stripes = 3; - else if (type & BTRFS_BLOCK_GROUP_RAID1C4) - num_stripes = 4; - else if (type & BTRFS_BLOCK_GROUP_RAID10) - num_stripes = 4; - - /* Adjust for more than 1 stripe per device */ - min_stripe_size = rattr->dev_stripes * BTRFS_STRIPE_LEN; - - rcu_read_lock(); - list_for_each_entry_rcu(device, &fs_devices->devices, dev_list) { - if (!test_bit(BTRFS_DEV_STATE_IN_FS_METADATA, - &device->dev_state) || - !device->bdev || - test_bit(BTRFS_DEV_STATE_REPLACE_TGT, &device->dev_state)) - continue; - - if (i >= nr_devices) - break; - - avail_space = device->total_bytes - device->bytes_used; - - /* align with stripe_len */ - avail_space = rounddown(avail_space, BTRFS_STRIPE_LEN); - - /* - * In order to avoid overwriting the superblock on the drive, - * btrfs starts at an offset of at least 1MB when doing chunk - * allocation. - * - * This ensures we have at least min_stripe_size free space - * after excluding 1MB. - */ - if (avail_space <= SZ_1M + min_stripe_size) - continue; - - avail_space -= SZ_1M; - - devices_info[i].dev = device; - devices_info[i].max_avail = avail_space; - - i++; - } - rcu_read_unlock(); - - nr_devices = i; - - btrfs_descending_sort_devices(devices_info, nr_devices); - - i = nr_devices - 1; - avail_space = 0; - while (nr_devices >= rattr->devs_min) { - num_stripes = min(num_stripes, nr_devices); - - if (devices_info[i].max_avail >= min_stripe_size) { - int j; - u64 alloc_size; - - avail_space += devices_info[i].max_avail * num_stripes; - alloc_size = devices_info[i].max_avail; - for (j = i + 1 - num_stripes; j <= i; j++) - devices_info[j].max_avail -= alloc_size; - } - i--; - nr_devices--; - } - - kfree(devices_info); - *free_bytes = avail_space; - return 0; -} - /* * Calculate numbers for 'df', pessimistic in case of mixed raid profiles. * @@ -2150,6 +2032,7 @@ static inline int btrfs_calc_avail_data_space(struct btrfs_fs_info *fs_info, static int btrfs_statfs(struct dentry *dentry, struct kstatfs *buf) { struct btrfs_fs_info *fs_info = btrfs_sb(dentry->d_sb); + struct btrfs_fs_devices *fs_devices = fs_info->fs_devices; struct btrfs_super_block *disk_super = fs_info->super_copy; struct btrfs_space_info *found; u64 total_used = 0; @@ -2159,7 +2042,7 @@ static int btrfs_statfs(struct dentry *dentry, struct kstatfs *buf) __be32 *fsid = (__be32 *)fs_info->fs_devices->fsid; unsigned factor = 1; struct btrfs_block_rsv *block_rsv = &fs_info->global_block_rsv; - int ret; + enum btrfs_raid_types data_type; u64 thresh = 0; int mixed = 0; @@ -2208,11 +2091,11 @@ static int btrfs_statfs(struct dentry *dentry, struct kstatfs *buf) buf->f_bfree = 0; spin_unlock(&block_rsv->lock); - buf->f_bavail = div_u64(total_free_data, factor); - ret = btrfs_calc_avail_data_space(fs_info, &total_free_data); - if (ret) - return ret; - buf->f_bavail += div_u64(total_free_data, factor); + data_type = btrfs_bg_flags_to_raid_index( + btrfs_data_alloc_profile(fs_info)); + + buf->f_bavail = total_free_data + + atomic64_read(&fs_devices->per_profile_avail[data_type]); buf->f_bavail = buf->f_bavail >> bits; /* -- 2.28.0

4 years, 9 months

1
0
0 0

[PATCH 9 11/12] btrfs: space-info: Use per-profile available space in can_overcommit()

by Qu Wenruo

For the following disk layout, can_overcommit() can cause false confidence in available space: devid 1 unallocated: 1T devid 2 unallocated: 10T metadata type: RAID1 As can_overcommit() simply uses unallocated space with factor to calculate the allocatable metadata chunk size. can_overcommit() believes we still have 5.5T for metadata chunks, while the truth is, we only have 1T available for metadata chunks. This can lead to ENOSPC at run_delalloc_range() and cause transaction abort. Since factor based calculation can't distinguish RAID1/RAID10 and DUP at all, we need proper chunk-allocator level awareness to do such estimation. Thankfully, we have per-profile available space already calculated, just use that facility to avoid such false confidence. CC: stable(a)vger.kernel.org # 5.4+ Reported-by: Marc Lehmann <schmorp(a)schmorp.de> Signed-off-by: Qu Wenruo <wqu(a)suse.com> Reviewed-by: Josef Bacik <josef(a)toxicpanda.com> --- fs/btrfs/space-info.c | 14 +++++--------- 1 file changed, 5 insertions(+), 9 deletions(-) diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c index 64b6e1d44f47..4bb4e3c3531f 100644 --- a/fs/btrfs/space-info.c +++ b/fs/btrfs/space-info.c @@ -336,25 +336,21 @@ static u64 calc_available_free_space(struct btrfs_fs_info *fs_info, struct btrfs_space_info *space_info, enum btrfs_reserve_flush_enum flush) { + enum btrfs_raid_types index; u64 profile; u64 avail; - int factor; if (space_info->flags & BTRFS_BLOCK_GROUP_SYSTEM) profile = btrfs_system_alloc_profile(fs_info); else profile = btrfs_metadata_alloc_profile(fs_info); - avail = atomic64_read(&fs_info->free_chunk_space); - /* - * If we have dup, raid1 or raid10 then only half of the free - * space is actually usable. For raid56, the space info used - * doesn't include the parity drive, so we don't have to - * change the math + * Grab avail space from per-profile array which should be as accurate + * as chunk allocator. */ - factor = btrfs_bg_type_to_factor(profile); - avail = div_u64(avail, factor); + index = btrfs_bg_flags_to_raid_index(profile); + avail = atomic64_read(&fs_info->fs_devices->per_profile_avail[index]); /* * If we aren't flushing all things, let us overcommit up to -- 2.28.0

4 years, 9 months

1
0
0 0

RFQ

by Shui biun

Dear stable Please i was referred to your company for this services and urgently. Please see attached our company quotation request. Kindly quote your best prices for this items including shipping cost, FOB and payment terms. Expecting your response asap. Regards, Shui biun

4 years, 9 months

1
0
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-stable-mirror October 2020