Dear Friend,
Good day to you ,thank you for your time and I hope you and your
family are staying safe. My name is Wesley Adams, I represent
Belair Investment LP UK as the company secretary. We are
Corporate Financing company that generally engage in Project
Financing. In the cause of our research for companies /
individuals applying for project funding, we came across your
contact details. At present, We have funds available to finance
projects in a deposit account with HSBC UK.
Kindly reply with your Proposal (Project plan/Executive Summary)
Best regards
Wesley Adams
Company Secretary
Belair Investment Lp.
This is a note to let you know that I've just added the patch titled
serial: 8250_pci: rewrite pericom_do_set_divisor()
to my tty git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty.git
in the tty-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From bb1201d4b38ec67bd9a871cf86b0cc10f28b15b5 Mon Sep 17 00:00:00 2001
From: Jay Dolan <jay.dolan(a)accesio.com>
Date: Mon, 22 Nov 2021 14:06:04 +0200
Subject: serial: 8250_pci: rewrite pericom_do_set_divisor()
Have pericom_do_set_divisor() use the uartclk instead of a hard coded
value to work with different speed crystals. Tested with 14.7456 and 24
MHz crystals.
Have pericom_do_set_divisor() always calculate the divisor rather than
call serial8250_do_set_divisor() for rates below baud_base.
Do not write registers or call serial8250_do_set_divisor() if valid
divisors could not be found.
Fixes: 6bf4e42f1d19 ("serial: 8250: Add support for higher baud rates to Pericom chips")
Cc: stable <stable(a)vger.kernel.org>
Signed-off-by: Jay Dolan <jay.dolan(a)accesio.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko(a)linux.intel.com>
Link: https://lore.kernel.org/r/20211122120604.3909-3-andriy.shevchenko@linux.int…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/tty/serial/8250/8250_pci.c | 30 +++++++++++++++++-------------
1 file changed, 17 insertions(+), 13 deletions(-)
diff --git a/drivers/tty/serial/8250/8250_pci.c b/drivers/tty/serial/8250/8250_pci.c
index b793d848aeb6..60f8fffdfd77 100644
--- a/drivers/tty/serial/8250/8250_pci.c
+++ b/drivers/tty/serial/8250/8250_pci.c
@@ -1324,29 +1324,33 @@ pericom_do_set_divisor(struct uart_port *port, unsigned int baud,
{
int scr;
int lcr;
- int actual_baud;
- int tolerance;
- for (scr = 5 ; scr <= 15 ; scr++) {
- actual_baud = 921600 * 16 / scr;
- tolerance = actual_baud / 50;
+ for (scr = 16; scr > 4; scr--) {
+ unsigned int maxrate = port->uartclk / scr;
+ unsigned int divisor = max(maxrate / baud, 1U);
+ int delta = maxrate / divisor - baud;
- if ((baud < actual_baud + tolerance) &&
- (baud > actual_baud - tolerance)) {
+ if (baud > maxrate + baud / 50)
+ continue;
+ if (delta > baud / 50)
+ divisor++;
+
+ if (divisor > 0xffff)
+ continue;
+
+ /* Update delta due to possible divisor change */
+ delta = maxrate / divisor - baud;
+ if (abs(delta) < baud / 50) {
lcr = serial_port_in(port, UART_LCR);
serial_port_out(port, UART_LCR, lcr | 0x80);
-
- serial_port_out(port, UART_DLL, 1);
- serial_port_out(port, UART_DLM, 0);
+ serial_port_out(port, UART_DLL, divisor & 0xff);
+ serial_port_out(port, UART_DLM, divisor >> 8 & 0xff);
serial_port_out(port, 2, 16 - scr);
serial_port_out(port, UART_LCR, lcr);
return;
- } else if (baud > actual_baud) {
- break;
}
}
- serial8250_do_set_divisor(port, baud, quot, quot_frac);
}
static int pci_pericom_setup(struct serial_private *priv,
const struct pciserial_board *board,
--
2.34.0
This is a note to let you know that I've just added the patch titled
serial: 8250_pci: Fix ACCES entries in pci_serial_quirks array
to my tty git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty.git
in the tty-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From c525c5d2437f93520388920baac6d9340c65d239 Mon Sep 17 00:00:00 2001
From: Jay Dolan <jay.dolan(a)accesio.com>
Date: Mon, 22 Nov 2021 14:06:03 +0200
Subject: serial: 8250_pci: Fix ACCES entries in pci_serial_quirks array
Fix error in table for PCI_DEVICE_ID_ACCESIO_PCIE_ICM_4S that caused it
and PCI_DEVICE_ID_ACCESIO_PCIE_ICM232_4 to be missing their fourth port.
Fixes: 78d3820b9bd3 ("serial: 8250_pci: Have ACCES cards that use the four port Pericom PI7C9X7954 chip use the pci_pericom_setup()")
Cc: stable <stable(a)vger.kernel.org>
Signed-off-by: Jay Dolan <jay.dolan(a)accesio.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko(a)linux.intel.com>
Link: https://lore.kernel.org/r/20211122120604.3909-2-andriy.shevchenko@linux.int…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/tty/serial/8250/8250_pci.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/drivers/tty/serial/8250/8250_pci.c b/drivers/tty/serial/8250/8250_pci.c
index 5d43de143f33..b793d848aeb6 100644
--- a/drivers/tty/serial/8250/8250_pci.c
+++ b/drivers/tty/serial/8250/8250_pci.c
@@ -2291,12 +2291,19 @@ static struct pci_serial_quirk pci_serial_quirks[] = {
.setup = pci_pericom_setup_four_at_eight,
},
{
- .vendor = PCI_DEVICE_ID_ACCESIO_PCIE_ICM_4S,
+ .vendor = PCI_VENDOR_ID_ACCESIO,
.device = PCI_DEVICE_ID_ACCESIO_PCIE_ICM232_4,
.subvendor = PCI_ANY_ID,
.subdevice = PCI_ANY_ID,
.setup = pci_pericom_setup_four_at_eight,
},
+ {
+ .vendor = PCI_VENDOR_ID_ACCESIO,
+ .device = PCI_DEVICE_ID_ACCESIO_PCIE_ICM_4S,
+ .subvendor = PCI_ANY_ID,
+ .subdevice = PCI_ANY_ID,
+ .setup = pci_pericom_setup_four_at_eight,
+ },
{
.vendor = PCI_VENDOR_ID_ACCESIO,
.device = PCI_DEVICE_ID_ACCESIO_MPCIE_ICM232_4,
--
2.34.0
This is a note to let you know that I've just added the patch titled
serial: tegra: Change lower tolerance baud rate limit for tegra20 and
to my tty git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty.git
in the tty-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From b40de7469ef135161c80af0e8c462298cc5dac00 Mon Sep 17 00:00:00 2001
From: Patrik John <patrik.john(a)u-blox.com>
Date: Tue, 23 Nov 2021 14:27:38 +0100
Subject: serial: tegra: Change lower tolerance baud rate limit for tegra20 and
tegra30
The current implementation uses 0 as lower limit for the baud rate
tolerance for tegra20 and tegra30 chips which causes isses on UART
initialization as soon as baud rate clock is lower than required even
when within the standard UART tolerance of +/- 4%.
This fix aligns the implementation with the initial commit description
of +/- 4% tolerance for tegra chips other than tegra186 and
tegra194.
Fixes: d781ec21bae6 ("serial: tegra: report clk rate errors")
Cc: stable <stable(a)vger.kernel.org>
Signed-off-by: Patrik John <patrik.john(a)u-blox.com>
Link: https://lore.kernel.org/r/sig.19614244f8.20211123132737.88341-1-patrik.john…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/tty/serial/serial-tegra.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/tty/serial/serial-tegra.c b/drivers/tty/serial/serial-tegra.c
index 45e2e4109acd..b6223fab0687 100644
--- a/drivers/tty/serial/serial-tegra.c
+++ b/drivers/tty/serial/serial-tegra.c
@@ -1506,7 +1506,7 @@ static struct tegra_uart_chip_data tegra20_uart_chip_data = {
.fifo_mode_enable_status = false,
.uart_max_port = 5,
.max_dma_burst_bytes = 4,
- .error_tolerance_low_range = 0,
+ .error_tolerance_low_range = -4,
.error_tolerance_high_range = 4,
};
@@ -1517,7 +1517,7 @@ static struct tegra_uart_chip_data tegra30_uart_chip_data = {
.fifo_mode_enable_status = false,
.uart_max_port = 5,
.max_dma_burst_bytes = 4,
- .error_tolerance_low_range = 0,
+ .error_tolerance_low_range = -4,
.error_tolerance_high_range = 4,
};
--
2.34.0
This is a note to let you know that I've just added the patch titled
serial: liteuart: Fix NULL pointer dereference in ->remove()
to my tty git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty.git
in the tty-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From 0f55f89d98c8b3e12b4f55f71c127a173e29557c Mon Sep 17 00:00:00 2001
From: Ilia Sergachev <silia(a)ethz.ch>
Date: Mon, 15 Nov 2021 22:49:44 +0100
Subject: serial: liteuart: Fix NULL pointer dereference in ->remove()
drvdata has to be set in _probe() - otherwise platform_get_drvdata()
causes null pointer dereference BUG in _remove().
Fixes: 1da81e5562fa ("drivers/tty/serial: add LiteUART driver")
Cc: stable <stable(a)vger.kernel.org>
Reviewed-by: Johan Hovold <johan(a)kernel.org>
Signed-off-by: Ilia Sergachev <silia(a)ethz.ch>
Link: https://lore.kernel.org/r/20211115224944.23f8c12b@dtkw
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/tty/serial/liteuart.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/tty/serial/liteuart.c b/drivers/tty/serial/liteuart.c
index dbc0559a9157..f075f4ff5fcf 100644
--- a/drivers/tty/serial/liteuart.c
+++ b/drivers/tty/serial/liteuart.c
@@ -285,6 +285,8 @@ static int liteuart_probe(struct platform_device *pdev)
port->line = dev_id;
spin_lock_init(&port->lock);
+ platform_set_drvdata(pdev, port);
+
return uart_add_one_port(&liteuart_driver, &uart->port);
}
--
2.34.0
This is a note to let you know that I've just added the patch titled
serial: pl011: Add ACPI SBSA UART match id
to my tty git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty.git
in the tty-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From ac442a077acf9a6bf1db4320ec0c3f303be092b3 Mon Sep 17 00:00:00 2001
From: Pierre Gondois <Pierre.Gondois(a)arm.com>
Date: Tue, 9 Nov 2021 17:22:48 +0000
Subject: serial: pl011: Add ACPI SBSA UART match id
The document 'ACPI for Arm Components 1.0' defines the following
_HID mappings:
-'Prime cell UART (PL011)': ARMH0011
-'SBSA UART': ARMHB000
Use the sbsa-uart driver when a device is described with
the 'ARMHB000' _HID.
Note:
PL011 devices currently use the sbsa-uart driver instead of the
uart-pl011 driver. Indeed, PL011 devices are not bound to a clock
in ACPI. It is not possible to change their baudrate.
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Pierre Gondois <Pierre.Gondois(a)arm.com>
Link: https://lore.kernel.org/r/20211109172248.19061-1-Pierre.Gondois@arm.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/tty/serial/amba-pl011.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/tty/serial/amba-pl011.c b/drivers/tty/serial/amba-pl011.c
index d361cd84ff8c..52518a606c06 100644
--- a/drivers/tty/serial/amba-pl011.c
+++ b/drivers/tty/serial/amba-pl011.c
@@ -2947,6 +2947,7 @@ MODULE_DEVICE_TABLE(of, sbsa_uart_of_match);
static const struct acpi_device_id __maybe_unused sbsa_uart_acpi_match[] = {
{ "ARMH0011", 0 },
+ { "ARMHB000", 0 },
{},
};
MODULE_DEVICE_TABLE(acpi, sbsa_uart_acpi_match);
--
2.34.0
The following commit has been merged into the irq/irqchip-fixes branch of irqchip:
Commit-ID: 8958389681b929fcc7301e7dc5f0da12e4a256a0
Gitweb: https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms/895838968…
Author: Billy Tsai <billy_tsai(a)aspeedtech.com>
AuthorDate: Wed, 24 Nov 2021 17:43:48 +08:00
Committer: Marc Zyngier <maz(a)kernel.org>
CommitterDate: Thu, 25 Nov 2021 16:50:44
irqchip/aspeed-scu: Replace update_bits with write_bits.
The interrupt status bits are cleared by writing 1, we should force a
write to clear the interrupt without checking if the value has changed.
Fixes: 04f605906ff0 ("irqchip: Add Aspeed SCU interrupt controller")
Signed-off-by: Billy Tsai <billy_tsai(a)aspeedtech.com>
Reviewed-by: Joel Stanley <joel(a)jms.id.au>
Signed-off-by: Marc Zyngier <maz(a)kernel.org>
Link: https://lore.kernel.org/r/20211124094348.11621-1-billy_tsai@aspeedtech.com
Cc: stable(a)vger.kernel.org
---
drivers/irqchip/irq-aspeed-scu-ic.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/irqchip/irq-aspeed-scu-ic.c b/drivers/irqchip/irq-aspeed-scu-ic.c
index f3c6855..18b77c3 100644
--- a/drivers/irqchip/irq-aspeed-scu-ic.c
+++ b/drivers/irqchip/irq-aspeed-scu-ic.c
@@ -76,8 +76,8 @@ static void aspeed_scu_ic_irq_handler(struct irq_desc *desc)
generic_handle_domain_irq(scu_ic->irq_domain,
bit - scu_ic->irq_shift);
- regmap_update_bits(scu_ic->scu, scu_ic->reg, mask,
- BIT(bit + ASPEED_SCU_IC_STATUS_SHIFT));
+ regmap_write_bits(scu_ic->scu, scu_ic->reg, mask,
+ BIT(bit + ASPEED_SCU_IC_STATUS_SHIFT));
}
chained_irq_exit(chip, desc);
The issue happened randomly in runtime. The message "Link is Down" is
popped but soon it recovered to "Link is Up".
The "Link is Down" results from the incorrect read data for reading the
PHY register via MDIO bus. The correct sequence for reading the data
shall be:
1. fire the command
2. wait for command done (this step was missing)
3. wait for data idle
4. read data from data register
Fixes: f160e99462c6 ("net: phy: Add mdio-aspeed")
Cc: stable(a)vger.kernel.org
Reviewed-by: Joel Stanley <joel(a)jms.id.au>
Signed-off-by: Dylan Hung <dylan_hung(a)aspeedtech.com>
---
v2: revise commit message
drivers/net/mdio/mdio-aspeed.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/drivers/net/mdio/mdio-aspeed.c b/drivers/net/mdio/mdio-aspeed.c
index cad820568f75..966c3b4ad59d 100644
--- a/drivers/net/mdio/mdio-aspeed.c
+++ b/drivers/net/mdio/mdio-aspeed.c
@@ -61,6 +61,13 @@ static int aspeed_mdio_read(struct mii_bus *bus, int addr, int regnum)
iowrite32(ctrl, ctx->base + ASPEED_MDIO_CTRL);
+ rc = readl_poll_timeout(ctx->base + ASPEED_MDIO_CTRL, ctrl,
+ !(ctrl & ASPEED_MDIO_CTRL_FIRE),
+ ASPEED_MDIO_INTERVAL_US,
+ ASPEED_MDIO_TIMEOUT_US);
+ if (rc < 0)
+ return rc;
+
rc = readl_poll_timeout(ctx->base + ASPEED_MDIO_DATA, data,
data & ASPEED_MDIO_DATA_IDLE,
ASPEED_MDIO_INTERVAL_US,
--
2.25.1
Having a signed (1 << 31) constant for TCR_EL2_RES1 and CPTR_EL2_TCPAC
causes the upper 32-bit to be set to 1 when assigning them to a 64-bit
variable. Bit 32 in TCR_EL2 is no longer RES0 in ARMv8.7: with FEAT_LPA2
it changes the meaning of bits 49:48 and 9:8 in the stage 1 EL2 page
table entries. As a result of the sign-extension, a non-VHE kernel can
no longer boot on a model with ARMv8.7 enabled.
CPTR_EL2 still has the top 32 bits RES0 but we should preempt any future
problems
Make these top bit constants unsigned as per commit df655b75c43f
("arm64: KVM: Avoid setting the upper 32 bits of VTCR_EL2 to 1").
Signed-off-by: Catalin Marinas <catalin.marinas(a)arm.com>
Reported-by: Chris January <Chris.January(a)arm.com>
Cc: <stable(a)vger.kernel.org>
Cc: Will Deacon <will(a)kernel.org>
Cc: Marc Zyngier <maz(a)kernel.org>
---
arch/arm64/include/asm/kvm_arm.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index a39fcf318c77..01d47c5886dc 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -91,7 +91,7 @@
#define HCR_HOST_VHE_FLAGS (HCR_RW | HCR_TGE | HCR_E2H)
/* TCR_EL2 Registers bits */
-#define TCR_EL2_RES1 ((1 << 31) | (1 << 23))
+#define TCR_EL2_RES1 ((1U << 31) | (1 << 23))
#define TCR_EL2_TBI (1 << 20)
#define TCR_EL2_PS_SHIFT 16
#define TCR_EL2_PS_MASK (7 << TCR_EL2_PS_SHIFT)
@@ -276,7 +276,7 @@
#define CPTR_EL2_TFP_SHIFT 10
/* Hyp Coprocessor Trap Register */
-#define CPTR_EL2_TCPAC (1 << 31)
+#define CPTR_EL2_TCPAC (1U << 31)
#define CPTR_EL2_TAM (1 << 30)
#define CPTR_EL2_TTA (1 << 20)
#define CPTR_EL2_TFP (1 << CPTR_EL2_TFP_SHIFT)
From: Stefano Stabellini <stefano.stabellini(a)xilinx.com>
If the xenstore page hasn't been allocated properly, reading the value
of the related hvm_param (HVM_PARAM_STORE_PFN) won't actually return
error. Instead, it will succeed and return zero. Instead of attempting
to xen_remap a bad guest physical address, detect this condition and
return early.
Note that although a guest physical address of zero for
HVM_PARAM_STORE_PFN is theoretically possible, it is not a good choice
and zero has never been validly used in that capacity.
Also recognize all bits set as an invalid value.
For 32-bit Linux, any pfn above ULONG_MAX would get truncated. Pfns
above ULONG_MAX should never be passed by the Xen tools to HVM guests
anyway, so check for this condition and return early.
Cc: stable(a)vger.kernel.org
Signed-off-by: Stefano Stabellini <stefano.stabellini(a)xilinx.com>
---
Changes in v4:
- say "all bits set" instead of INVALID_PFN
- improve check
Changes in v3:
- improve in-code comment
- improve check
Changes in v2:
- add check for ULLONG_MAX (unitialized)
- add check for ULONG_MAX #if BITS_PER_LONG == 32 (actual error)
- add pr_err error message
drivers/xen/xenbus/xenbus_probe.c | 23 +++++++++++++++++++++++
1 file changed, 23 insertions(+)
diff --git a/drivers/xen/xenbus/xenbus_probe.c b/drivers/xen/xenbus/xenbus_probe.c
index 94405bb3829e..251b26439733 100644
--- a/drivers/xen/xenbus/xenbus_probe.c
+++ b/drivers/xen/xenbus/xenbus_probe.c
@@ -951,6 +951,29 @@ static int __init xenbus_init(void)
err = hvm_get_parameter(HVM_PARAM_STORE_PFN, &v);
if (err)
goto out_error;
+ /*
+ * Uninitialized hvm_params are zero and return no error.
+ * Although it is theoretically possible to have
+ * HVM_PARAM_STORE_PFN set to zero on purpose, in reality it is
+ * not zero when valid. If zero, it means that Xenstore hasn't
+ * been properly initialized. Instead of attempting to map a
+ * wrong guest physical address return error.
+ *
+ * Also recognize all bits set as an invalid value.
+ */
+ if (!v || !~v) {
+ err = -ENOENT;
+ goto out_error;
+ }
+ /* Avoid truncation on 32-bit. */
+#if BITS_PER_LONG == 32
+ if (v > ULONG_MAX) {
+ pr_err("%s: cannot handle HVM_PARAM_STORE_PFN=%llx > ULONG_MAX\n",
+ __func__, v);
+ err = -EINVAL;
+ goto out_error;
+ }
+#endif
xen_store_gfn = (unsigned long)v;
xen_store_interface =
xen_remap(xen_store_gfn << XEN_PAGE_SHIFT,
--
2.25.1
Regression found on arm gcc-11 builds with tinyconfig and allnoconfig.
Following build warnings / errors reported on stable-rc 4.9.
metadata:
git_describe: v4.9.290-208-gb2ae18f41670
git_repo: https://gitlab.com/Linaro/lkft/mirrors/stable/linux-stable-rc
git_short_log: b2ae18f41670 (\"Linux 4.9.291-rc1\")
target_arch: arm
toolchain: gcc-11 / gcc-10 / gcc-9 / gcc-8
build error :
--------------
make --silent --keep-going --jobs=8
O=/home/tuxbuild/.cache/tuxmake/builds/current ARCH=arm
CROSS_COMPILE=arm-linux-gnueabihf- 'CC=sccache
arm-linux-gnueabihf-gcc' 'HOSTCC=sccache gcc'
In file included from arch/arm/include/asm/tlb.h:28,
from arch/arm/mm/init.c:34:
include/asm-generic/tlb.h: In function 'tlb_flush_pmd_range':
include/asm-generic/tlb.h:208:54: error: 'PMD_SIZE' undeclared (first
use in this function); did you mean 'PUD_SIZE'?
208 | if (tlb->page_size != 0 && tlb->page_size != PMD_SIZE)
| ^~~~~~~~
| PUD_SIZE
include/asm-generic/tlb.h:208:54: note: each undeclared identifier is
reported only once for each function it appears in
make[2]: *** [scripts/Makefile.build:307: arch/arm/mm/init.o] Error 1
make[2]: Target '__build' not remade because of errors.
make[1]: *** [Makefile:1036: arch/arm/mm] Error 2
Reported-by: Linux Kernel Functional Testing <lkft(a)linaro.org>
Patch pointing to,
hugetlbfs: flush TLBs correctly after huge_pmd_unshare
commit a4a118f2eead1d6c49e00765de89878288d4b890 upstream.
When __unmap_hugepage_range() calls to huge_pmd_unshare() succeed, a TLB
flush is missing. This TLB flush must be performed before releasing the
i_mmap_rwsem, in order to prevent an unshared PMDs page from being
released and reused before the TLB flush took place.
Arguably, a comprehensive solution would use mmu_gather interface to
batch the TLB flushes and the PMDs page release, however it is not an
easy solution: (1) try_to_unmap_one() and try_to_migrate_one() also call
huge_pmd_unshare() and they cannot use the mmu_gather interface; and (2)
deferring the release of the page reference for the PMDs page until
after i_mmap_rwsem is dropeed can confuse huge_pmd_unshare() into
thinking PMDs are shared when they are not.
Fix __unmap_hugepage_range() by adding the missing TLB flush, and
forcing a flush when unshare is successful.
Fixes: 24669e58477e ("hugetlb: use mmu_gather instead of a temporary
linked list for accumulating pages)" # 3.6
Signed-off-by: Nadav Amit <namit(a)vmware.com>
Reviewed-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: Aneesh Kumar K.V <aneesh.kumar(a)linux.vnet.ibm.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu(a)jp.fujitsu.com>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
build link:
-----------
https://builds.tuxbuild.com//21Mf9eMMjq4oO5ZflMwRCPssSc0/build.log
build config:
-------------
https://builds.tuxbuild.com//21Mf9eMMjq4oO5ZflMwRCPssSc0/config
# To install tuxmake on your system globally
# sudo pip3 install -U tuxmake
tuxmake --runtime podman --target-arch arm --toolchain gcc-11
--kconfig tinyconfig
--
Linaro LKFT
https://lkft.linaro.org
FYI,
As you already know this build error still noticed on recent stable-rc queue/4.9
make --silent --keep-going --jobs=8
O=/home/tuxbuild/.cache/tuxmake/builds/current ARCH=arm
CROSS_COMPILE=arm-linux-gnueabihf- 'CC=sccache
arm-linux-gnueabihf-gcc' 'HOSTCC=sccache gcc'
In file included from /builds/linux/arch/arm/include/asm/tlb.h:28,
from /builds/linux/arch/arm/mm/init.c:34:
/builds/linux/include/asm-generic/tlb.h: In function 'tlb_flush_pmd_range':
/builds/linux/include/asm-generic/tlb.h:208:47: error: 'PMD_SIZE'
undeclared (first use in this function); did you mean 'PUD_SIZE'?
208 | if (tlb->page_size != 0 && tlb->page_size != PMD_SIZE)
| ^~~~~~~~
| PUD_SIZE
metadata:
git_describe: v4.9.290-204-g1b54705bd25f
git_repo: https://gitlab.com/Linaro/lkft/mirrors/stable/linux-stable-rc-queues
git_sha: 1b54705bd25fff60b31f9d6db7b9e380f70beb03
git_short_log: 1b54705bd25f (\hugetlbfs: flush TLBs correctly
after huge_pmd_unshare\)
kconfig: tinyconfig
target_arch: arm
toolchain: gcc-10 / gcc-11
Reported-by: Linux Kernel Functional Testing <lkft(a)linaro.org>
--
Linaro LKFT
https://lkft.linaro.org
Regression found on s390 gcc-11 builds with defconfig
Following build warnings / errors reported on stable-rc 4.19.
metadata:
git_describe: v4.19.217-324-g451ddd7eb93b
git_repo: https://gitlab.com/Linaro/lkft/mirrors/stable/linux-stable-rc
git_short_log: 451ddd7eb93b (\"Linux 4.19.218-rc1\")
target_arch: s390
toolchain: gcc-11 / gcc-10 / gcc-9 / gcc-8
build error :
--------------
make --silent --keep-going --jobs=8
O=/home/tuxbuild/.cache/tuxmake/builds/current ARCH=s390
CROSS_COMPILE=s390x-linux-gnu- 'CC=sccache s390x-linux-gnu-gcc'
'HOSTCC=sccache gcc'
arch/s390/kernel/setup.c: In function 'setup_lowcore_dat_off':
arch/s390/kernel/setup.c:342:9: warning: 'memcpy' reading 128 bytes
from a region of size 0 [-Wstringop-overread]
342 | memcpy(lc->stfle_fac_list, S390_lowcore.stfle_fac_list,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
343 | sizeof(lc->stfle_fac_list));
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~
arch/s390/kernel/setup.c:344:9: warning: 'memcpy' reading 128 bytes
from a region of size 0 [-Wstringop-overread]
344 | memcpy(lc->alt_stfle_fac_list, S390_lowcore.alt_stfle_fac_list,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
345 | sizeof(lc->alt_stfle_fac_list));
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
arch/s390/kvm/kvm-s390.c: In function 'kvm_s390_get_machine':
arch/s390/kvm/kvm-s390.c:1302:9: warning: 'memcpy' reading 128 bytes
from a region of size 0 [-Wstringop-overread]
1302 | memcpy((unsigned long *)&mach->fac_list,
S390_lowcore.stfle_fac_list,
|
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1303 | sizeof(S390_lowcore.stfle_fac_list));
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from arch/s390/kernel/lgr.c:13:
In function 'stfle',
inlined from 'lgr_info_get' at arch/s390/kernel/lgr.c:122:2:
arch/s390/include/asm/facility.h:88:9: warning: 'memcpy' reading 4
bytes from a region of size 0 [-Wstringop-overread]
88 | memcpy(stfle_fac_list, &S390_lowcore.stfl_fac_list, 4);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In function 'pcpu_prepare_secondary',
inlined from '__cpu_up' at arch/s390/kernel/smp.c:878:2:
arch/s390/kernel/smp.c:271:9: warning: 'memcpy' reading 128 bytes from
a region of size 0 [-Wstringop-overread]
271 | memcpy(lc->stfle_fac_list, S390_lowcore.stfle_fac_list,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
272 | sizeof(lc->stfle_fac_list));
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~
arch/s390/kernel/smp.c:273:9: warning: 'memcpy' reading 128 bytes from
a region of size 0 [-Wstringop-overread]
273 | memcpy(lc->alt_stfle_fac_list, S390_lowcore.alt_stfle_fac_list,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
274 | sizeof(lc->alt_stfle_fac_list));
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
mm/hugetlb.c: In function '__unmap_hugepage_range':
mm/hugetlb.c:3455:25: error: implicit declaration of function
'tlb_flush_pmd_range'; did you mean 'tlb_flush_mmu_free'?
[-Werror=implicit-function-declaration]
3455 | tlb_flush_pmd_range(tlb, address &
PUD_MASK, PUD_SIZE);
| ^~~~~~~~~~~~~~~~~~~
| tlb_flush_mmu_free
cc1: some warnings being treated as errors
Reported-by: Linux Kernel Functional Testing <lkft(a)linaro.org>
build link:
-----------
https://builds.tuxbuild.com/21MfFQQKrxPYUW7b8amBJjt3Ki7/build.log
build config:
-------------
https://builds.tuxbuild.com/21MfFQQKrxPYUW7b8amBJjt3Ki7/config
# To install tuxmake on your system globally
# sudo pip3 install -U tuxmake
tuxmake --runtime podman --target-arch s390 --toolchain gcc-11
--kconfig defconfig
--
Linaro LKFT
https://lkft.linaro.org
When there is no policy configured on the system, the default policy is
checked in xfrm_route_forward. However, it was done with the wrong
direction (XFRM_POLICY_FWD instead of XFRM_POLICY_OUT).
The default policy for XFRM_POLICY_FWD was checked just before, with a call
to xfrm[46]_policy_check().
CC: stable(a)vger.kernel.org
Fixes: 2d151d39073a ("xfrm: Add possibility to set the default to block if we have no policy")
Signed-off-by: Nicolas Dichtel <nicolas.dichtel(a)6wind.com>
---
include/net/xfrm.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index 2308210793a0..55e574511af5 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -1162,7 +1162,7 @@ static inline int xfrm_route_forward(struct sk_buff *skb, unsigned short family)
{
struct net *net = dev_net(skb->dev);
- if (xfrm_default_allow(net, XFRM_POLICY_FWD))
+ if (xfrm_default_allow(net, XFRM_POLICY_OUT))
return !net->xfrm.policy_count[XFRM_POLICY_OUT] ||
(skb_dst(skb)->flags & DST_NOXFRM) ||
__xfrm_route_forward(skb, family);
--
2.33.0
Because DAMON sleeps in uninterruptible mode, /proc/loadavg reports fake
load while DAMON is turned on, though it is doing nothing. This can
confuse users[1]. To avoid the case, this commit makes DAMON sleeps in
idle mode.
[1] https://lore.kernel.org/all/11868371.O9o76ZdvQC@natalenko.name/
Fixes: 2224d8485492 ("mm: introduce Data Access MONitor (DAMON)")
Reported-by: Oleksandr Natalenko <oleksandr(a)natalenko.name>
Signed-off-by: SeongJae Park <sj(a)kernel.org>
Cc: <stable(a)vger.kernel.org> # 5.15.x
---
I think this needs to be applied on v5.15.y, but this cannot cleanly
applied there as is. I will back-port this on v5.15.y and post later
once this is merged in the mainline.
mm/damon/core.c | 21 ++++++++++++++++++---
1 file changed, 18 insertions(+), 3 deletions(-)
diff --git a/mm/damon/core.c b/mm/damon/core.c
index daacd9536c7c..7813f47aadc9 100644
--- a/mm/damon/core.c
+++ b/mm/damon/core.c
@@ -12,6 +12,8 @@
#include <linux/kthread.h>
#include <linux/mm.h>
#include <linux/random.h>
+#include <linux/sched.h>
+#include <linux/sched/debug.h>
#include <linux/slab.h>
#include <linux/string.h>
@@ -976,12 +978,25 @@ static unsigned long damos_wmark_wait_us(struct damos *scheme)
return 0;
}
+/* sleep for @usecs in idle mode */
+static void __sched damon_usleep_idle(unsigned long usecs)
+{
+ ktime_t exp = ktime_add_us(ktime_get(), usecs);
+ u64 delta = usecs * NSEC_PER_USEC / 100; /* allow 1% error */
+
+ for (;;) {
+ __set_current_state(TASK_IDLE);
+ if (!schedule_hrtimeout_range(&exp, delta, HRTIMER_MODE_ABS))
+ break;
+ }
+}
+
static void kdamond_usleep(unsigned long usecs)
{
if (usecs > 100 * 1000)
- schedule_timeout_interruptible(usecs_to_jiffies(usecs));
+ schedule_timeout_idle(usecs_to_jiffies(usecs));
else
- usleep_range(usecs, usecs + 1);
+ damon_usleep_idle(usecs);
}
/* Returns negative error code if it's not activated but should return */
@@ -1036,7 +1051,7 @@ static int kdamond_fn(void *data)
ctx->callback.after_sampling(ctx))
done = true;
- usleep_range(ctx->sample_interval, ctx->sample_interval + 1);
+ kdamond_usleep(ctx->sample_interval);
if (ctx->primitive.check_accesses)
max_nr_accesses = ctx->primitive.check_accesses(ctx);
--
2.17.1
Regression found on arm gcc-11 builds
Following build warnings / errors reported on stable-rc queue/5.10.
metadata:
git_describe: v5.10.81-155-gca79bd042925
git_repo: https://gitlab.com/Linaro/lkft/mirrors/stable/linux-stable-rc-queues
git_short_log: ca79bd042925 (\"x86/Kconfig: Fix an unused variable
error in dell-smm-hwmon\")
target_arch: arm
toolchain: gcc-11
build error :
--------------
arch/arm/boot/dts/bcm53016-meraki-mr32.dts:204.4-14: Warning
(reg_format): /srab@18007000/ports/port@0:reg: property has invalid
length (4 bytes) (#address-cells == 2, #size-cells == 1)
arch/arm/boot/dts/bcm53016-meraki-mr32.dts:209.4-14: Warning
(reg_format): /srab@18007000/ports/port@5:reg: property has invalid
length (4 bytes) (#address-cells == 2, #size-cells == 1)
arch/arm/boot/dts/bcm53016-meraki-mr32.dtb: Warning
(pci_device_bus_num): Failed prerequisite 'reg_format'
arch/arm/boot/dts/bcm53016-meraki-mr32.dtb: Warning (i2c_bus_reg):
Failed prerequisite 'reg_format'
arch/arm/boot/dts/bcm53016-meraki-mr32.dtb: Warning (spi_bus_reg):
Failed prerequisite 'reg_format'
arch/arm/boot/dts/bcm53016-meraki-mr32.dts:203.10-206.5: Warning
(avoid_default_addr_size): /srab@18007000/ports/port@0: Relying on
default #address-cells value
arch/arm/boot/dts/bcm53016-meraki-mr32.dts:203.10-206.5: Warning
(avoid_default_addr_size): /srab@18007000/ports/port@0: Relying on
default #size-cells value
arch/arm/boot/dts/bcm53016-meraki-mr32.dts:208.10-217.5: Warning
(avoid_default_addr_size): /srab@18007000/ports/port@5: Relying on
default #address-cells value
arch/arm/boot/dts/bcm53016-meraki-mr32.dts:208.10-217.5: Warning
(avoid_default_addr_size): /srab@18007000/ports/port@5: Relying on
default #size-cells value
drivers/cpuidle/cpuidle-tegra.c: In function 'tegra_cpuidle_probe':
drivers/cpuidle/cpuidle-tegra.c:349:38: error:
'TEGRA_SUSPEND_NOT_READY' undeclared (first use in this function); did
you mean 'TEGRA_SUSPEND_NONE'?
349 | if (tegra_pmc_get_suspend_mode() == TEGRA_SUSPEND_NOT_READY)
| ^~~~~~~~~~~~~~~~~~~~~~~
| TEGRA_SUSPEND_NONE
Reported-by: Linux Kernel Functional Testing <lkft(a)linaro.org>
due to the following patch,
cpuidle: tegra: Check whether PMC is ready
[ Upstream commit bdb1ffdad3b73e4d0538098fc02e2ea87a6b27cd ]
Check whether PMC is ready before proceeding with the cpuidle registration.
This fixes racing with the PMC driver probe order, which results in a
disabled deepest CC6 idling state if cpuidle driver is probed before the
PMC.
Acked-by: Daniel Lezcano <daniel.lezcano(a)linaro.org>
Signed-off-by: Dmitry Osipenko <digetx(a)gmail.com>
Signed-off-by: Thierry Reding <treding(a)nvidia.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
build link:
-----------
https://builds.tuxbuild.com//21Ma7IQryRIdyeUb4V2IeX4ePy8/build.log
build config:
-------------
https://builds.tuxbuild.com//21Ma7IQryRIdyeUb4V2IeX4ePy8/config
# To install tuxmake on your system globally
# sudo pip3 install -U tuxmake
tuxmake --runtime podman --target-arch arm --toolchain gcc-10
--kconfig defconfig \
--kconfig-add
https://builds.tuxbuild.com//21Ma7IQryRIdyeUb4V2IeX4ePy8/config
--
Linaro LKFT
https://lkft.linaro.org
This patch fixes a data race in commit 779750d20b93 ("shmem: split huge pages
beyond i_size under memory pressure").
Here are call traces causing race:
Call Trace 1:
shmem_unused_huge_shrink+0x3ae/0x410
? __list_lru_walk_one.isra.5+0x33/0x160
super_cache_scan+0x17c/0x190
shrink_slab.part.55+0x1ef/0x3f0
shrink_node+0x10e/0x330
kswapd+0x380/0x740
kthread+0xfc/0x130
? mem_cgroup_shrink_node+0x170/0x170
? kthread_create_on_node+0x70/0x70
ret_from_fork+0x1f/0x30
Call Trace 2:
shmem_evict_inode+0xd8/0x190
evict+0xbe/0x1c0
do_unlinkat+0x137/0x330
do_syscall_64+0x76/0x120
entry_SYSCALL_64_after_hwframe+0x3d/0xa2
A simple explanation:
Image there are 3 items in the local list (@list).
In the first traversal, A is not deleted from @list.
1) A->B->C
^
|
pos (leave)
In the second traversal, B is deleted from @list. Concurrently, A is
deleted from @list through shmem_evict_inode() since last reference counter of
inode is dropped by other thread. Then the @list is corrupted.
2) A->B->C
^ ^
| |
evict pos (drop)
We should make sure the inode is either on the global list or deleted from
any local list before iput().
Fixed by moving inodes back to global list before we put them.
Fixes: 779750d20b93 ("shmem: split huge pages beyond i_size under memory pressure")
Cc: stable(a)vger.kernel.org # v4.8+
Signed-off-by: Gang Li <ligang.bdlg(a)bytedance.com>
Reviewed-by: Muchun Song <songmuchun(a)bytedance.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
---
Changes in v5:
- Fix a compile warning
Changes in v4:
- Rework the comments
Changes in v3:
- Add more comment.
- Use list_move(&info->shrinklist, &sbinfo->shrinklist) instead of
list_move(pos, &sbinfo->shrinklist) for consistency.
Changes in v2: https://lore.kernel.org/all/20211124030840.88455-1-ligang.bdlg@bytedance.co…
- Move spinlock to the front of iput instead of changing lock type
since iput will call evict which may cause deadlock by requesting
shrinklist_lock.
- Add call trace in commit message.
v1: https://lore.kernel.org/lkml/20211122064126.76734-1-ligang.bdlg@bytedance.c…
---
mm/shmem.c | 36 ++++++++++++++++++++----------------
1 file changed, 20 insertions(+), 16 deletions(-)
diff --git a/mm/shmem.c b/mm/shmem.c
index 9023103ee7d8..a6487fe0583f 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -554,7 +554,7 @@ static unsigned long shmem_unused_huge_shrink(struct shmem_sb_info *sbinfo,
struct shmem_inode_info *info;
struct page *page;
unsigned long batch = sc ? sc->nr_to_scan : 128;
- int removed = 0, split = 0;
+ int split = 0;
if (list_empty(&sbinfo->shrinklist))
return SHRINK_STOP;
@@ -569,7 +569,6 @@ static unsigned long shmem_unused_huge_shrink(struct shmem_sb_info *sbinfo,
/* inode is about to be evicted */
if (!inode) {
list_del_init(&info->shrinklist);
- removed++;
goto next;
}
@@ -577,12 +576,12 @@ static unsigned long shmem_unused_huge_shrink(struct shmem_sb_info *sbinfo,
if (round_up(inode->i_size, PAGE_SIZE) ==
round_up(inode->i_size, HPAGE_PMD_SIZE)) {
list_move(&info->shrinklist, &to_remove);
- removed++;
goto next;
}
list_move(&info->shrinklist, &list);
next:
+ sbinfo->shrinklist_len--;
if (!--batch)
break;
}
@@ -602,7 +601,7 @@ static unsigned long shmem_unused_huge_shrink(struct shmem_sb_info *sbinfo,
inode = &info->vfs_inode;
if (nr_to_split && split >= nr_to_split)
- goto leave;
+ goto move_back;
page = find_get_page(inode->i_mapping,
(inode->i_size & HPAGE_PMD_MASK) >> PAGE_SHIFT);
@@ -616,38 +615,43 @@ static unsigned long shmem_unused_huge_shrink(struct shmem_sb_info *sbinfo,
}
/*
- * Leave the inode on the list if we failed to lock
- * the page at this time.
+ * Move the inode on the list back to shrinklist if we failed
+ * to lock the page at this time.
*
* Waiting for the lock may lead to deadlock in the
* reclaim path.
*/
if (!trylock_page(page)) {
put_page(page);
- goto leave;
+ goto move_back;
}
ret = split_huge_page(page);
unlock_page(page);
put_page(page);
- /* If split failed leave the inode on the list */
+ /* If split failed move the inode on the list back to shrinklist */
if (ret)
- goto leave;
+ goto move_back;
split++;
drop:
list_del_init(&info->shrinklist);
- removed++;
-leave:
+ goto put;
+move_back:
+ /*
+ * Make sure the inode is either on the global list or deleted from
+ * any local list before iput() since it could be deleted in another
+ * thread once we put the inode (then the local list is corrupted).
+ */
+ spin_lock(&sbinfo->shrinklist_lock);
+ list_move(&info->shrinklist, &sbinfo->shrinklist);
+ sbinfo->shrinklist_len++;
+ spin_unlock(&sbinfo->shrinklist_lock);
+put:
iput(inode);
}
- spin_lock(&sbinfo->shrinklist_lock);
- list_splice_tail(&list, &sbinfo->shrinklist);
- sbinfo->shrinklist_len -= removed;
- spin_unlock(&sbinfo->shrinklist_lock);
-
return split;
}
--
2.20.1
This patch fixes a data race in commit 779750d20b93 ("shmem: split huge pages
beyond i_size under memory pressure").
Here are call traces causing race:
Call Trace 1:
shmem_unused_huge_shrink+0x3ae/0x410
? __list_lru_walk_one.isra.5+0x33/0x160
super_cache_scan+0x17c/0x190
shrink_slab.part.55+0x1ef/0x3f0
shrink_node+0x10e/0x330
kswapd+0x380/0x740
kthread+0xfc/0x130
? mem_cgroup_shrink_node+0x170/0x170
? kthread_create_on_node+0x70/0x70
ret_from_fork+0x1f/0x30
Call Trace 2:
shmem_evict_inode+0xd8/0x190
evict+0xbe/0x1c0
do_unlinkat+0x137/0x330
do_syscall_64+0x76/0x120
entry_SYSCALL_64_after_hwframe+0x3d/0xa2
A simple explanation:
Image there are 3 items in the local list (@list).
In the first traversal, A is not deleted from @list.
1) A->B->C
^
|
pos (leave)
In the second traversal, B is deleted from @list. Concurrently, A is
deleted from @list through shmem_evict_inode() since last reference counter of
inode is dropped by other thread. Then the @list is corrupted.
2) A->B->C
^ ^
| |
evict pos (drop)
We should make sure the inode is either on the global list or deleted from
any local list before iput().
Fixed by moving inodes back to global list before we put them.
Fixes: 779750d20b93 ("shmem: split huge pages beyond i_size under memory pressure")
Signed-off-by: Gang Li <ligang.bdlg(a)bytedance.com>
---
mm/shmem.c | 34 +++++++++++++++++++---------------
1 file changed, 19 insertions(+), 15 deletions(-)
diff --git a/mm/shmem.c b/mm/shmem.c
index 9023103ee7d8..e6ccb2a076ff 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -569,7 +569,6 @@ static unsigned long shmem_unused_huge_shrink(struct shmem_sb_info *sbinfo,
/* inode is about to be evicted */
if (!inode) {
list_del_init(&info->shrinklist);
- removed++;
goto next;
}
@@ -577,12 +576,12 @@ static unsigned long shmem_unused_huge_shrink(struct shmem_sb_info *sbinfo,
if (round_up(inode->i_size, PAGE_SIZE) ==
round_up(inode->i_size, HPAGE_PMD_SIZE)) {
list_move(&info->shrinklist, &to_remove);
- removed++;
goto next;
}
list_move(&info->shrinklist, &list);
next:
+ sbinfo->shrinklist_len--;
if (!--batch)
break;
}
@@ -602,7 +601,7 @@ static unsigned long shmem_unused_huge_shrink(struct shmem_sb_info *sbinfo,
inode = &info->vfs_inode;
if (nr_to_split && split >= nr_to_split)
- goto leave;
+ goto move_back;
page = find_get_page(inode->i_mapping,
(inode->i_size & HPAGE_PMD_MASK) >> PAGE_SHIFT);
@@ -616,38 +615,43 @@ static unsigned long shmem_unused_huge_shrink(struct shmem_sb_info *sbinfo,
}
/*
- * Leave the inode on the list if we failed to lock
- * the page at this time.
+ * Move the inode on the list back to shrinklist if we failed
+ * to lock the page at this time.
*
* Waiting for the lock may lead to deadlock in the
* reclaim path.
*/
if (!trylock_page(page)) {
put_page(page);
- goto leave;
+ goto move_back;
}
ret = split_huge_page(page);
unlock_page(page);
put_page(page);
- /* If split failed leave the inode on the list */
+ /* If split failed move the inode on the list back to shrinklist */
if (ret)
- goto leave;
+ goto move_back;
split++;
drop:
list_del_init(&info->shrinklist);
- removed++;
-leave:
+ goto put;
+move_back:
+ /*
+ * Make sure the inode is either on the global list or deleted from
+ * any local list before iput() since it could be deleted in another
+ * thread once we put the inode (then the local list is corrupted).
+ */
+ spin_lock(&sbinfo->shrinklist_lock);
+ list_move(&info->shrinklist, &sbinfo->shrinklist);
+ sbinfo->shrinklist_len++;
+ spin_unlock(&sbinfo->shrinklist_lock);
+put:
iput(inode);
}
- spin_lock(&sbinfo->shrinklist_lock);
- list_splice_tail(&list, &sbinfo->shrinklist);
- sbinfo->shrinklist_len -= removed;
- spin_unlock(&sbinfo->shrinklist_lock);
-
return split;
}
--
2.20.1
This patch fixes a data race in commit 779750d20b93 ("shmem: split huge pages
beyond i_size under memory pressure").
Here are call traces:
Call Trace 1:
shmem_unused_huge_shrink+0x3ae/0x410
? __list_lru_walk_one.isra.5+0x33/0x160
super_cache_scan+0x17c/0x190
shrink_slab.part.55+0x1ef/0x3f0
shrink_node+0x10e/0x330
kswapd+0x380/0x740
kthread+0xfc/0x130
? mem_cgroup_shrink_node+0x170/0x170
? kthread_create_on_node+0x70/0x70
ret_from_fork+0x1f/0x30
Call Trace 2:
shmem_evict_inode+0xd8/0x190
evict+0xbe/0x1c0
do_unlinkat+0x137/0x330
do_syscall_64+0x76/0x120
entry_SYSCALL_64_after_hwframe+0x3d/0xa2
The simultaneous deletion of adjacent elements in the local list (@list)
by shmem_unused_huge_shrink and shmem_evict_inode will break the list.
Image there are 3 items in the local list (@list).
In the first traversal, A is not deleted from @list.
1) A->B->C
^
|
pos (leave)
In the second traversal, B is deleted from @list. Concurrently, A is
deleted from @list through shmem_evict_inode() since last reference counter of
inode is dropped by other thread. Then the @list is corrupted.
2) A->B->C
^ ^
| |
evict pos (drop)
Fix:
We should make sure the item is either on the global list or deleted from
any local list before iput().
Fixed by moving inodes that are on @list and will not be deleted back to
global list before iput.
Fixes: 779750d20b93 ("shmem: split huge pages beyond i_size under memory pressure")
Signed-off-by: Gang Li <ligang.bdlg(a)bytedance.com>
---
Changes in v3:
- Add more comment.
- Use list_move(&info->shrinklist, &sbinfo->shrinklist) instead of
list_move(pos, &sbinfo->shrinklist) for consistency.
Changes in v2: https://lore.kernel.org/all/20211124030840.88455-1-ligang.bdlg@bytedance.co…
- Move spinlock to the front of iput instead of changing lock type
since iput will call evict which may cause deadlock by requesting
shrinklist_lock.
- Add call trace in commit message.
v1: https://lore.kernel.org/lkml/20211122064126.76734-1-ligang.bdlg@bytedance.c…
---
mm/shmem.c | 35 ++++++++++++++++++++---------------
1 file changed, 20 insertions(+), 15 deletions(-)
diff --git a/mm/shmem.c b/mm/shmem.c
index 9023103ee7d8..ab2df692bd58 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -569,7 +569,6 @@ static unsigned long shmem_unused_huge_shrink(struct shmem_sb_info *sbinfo,
/* inode is about to be evicted */
if (!inode) {
list_del_init(&info->shrinklist);
- removed++;
goto next;
}
@@ -577,15 +576,16 @@ static unsigned long shmem_unused_huge_shrink(struct shmem_sb_info *sbinfo,
if (round_up(inode->i_size, PAGE_SIZE) ==
round_up(inode->i_size, HPAGE_PMD_SIZE)) {
list_move(&info->shrinklist, &to_remove);
- removed++;
goto next;
}
list_move(&info->shrinklist, &list);
next:
+ removed++;
if (!--batch)
break;
}
+ sbinfo->shrinklist_len -= removed;
spin_unlock(&sbinfo->shrinklist_lock);
list_for_each_safe(pos, next, &to_remove) {
@@ -602,7 +602,7 @@ static unsigned long shmem_unused_huge_shrink(struct shmem_sb_info *sbinfo,
inode = &info->vfs_inode;
if (nr_to_split && split >= nr_to_split)
- goto leave;
+ goto move_back;
page = find_get_page(inode->i_mapping,
(inode->i_size & HPAGE_PMD_MASK) >> PAGE_SHIFT);
@@ -616,38 +616,43 @@ static unsigned long shmem_unused_huge_shrink(struct shmem_sb_info *sbinfo,
}
/*
- * Leave the inode on the list if we failed to lock
- * the page at this time.
+ * Move the inode on the list back to shrinklist if we failed
+ * to lock the page at this time.
*
* Waiting for the lock may lead to deadlock in the
* reclaim path.
*/
if (!trylock_page(page)) {
put_page(page);
- goto leave;
+ goto move_back;
}
ret = split_huge_page(page);
unlock_page(page);
put_page(page);
- /* If split failed leave the inode on the list */
+ /* If split failed move the inode on the list back to shrinklist */
if (ret)
- goto leave;
+ goto move_back;
split++;
drop:
list_del_init(&info->shrinklist);
- removed++;
-leave:
+ goto put;
+move_back:
+ /* inodes that are on @list and will not be deleted must be moved back to
+ * global list before iput for two reasons:
+ * 1. iput in lock: iput call shmem_evict_inode, then cause deadlock.
+ * 2. iput before lock: shmem_evict_inode may grab the inode on @list,
+ * which will cause race.
+ */
+ spin_lock(&sbinfo->shrinklist_lock);
+ list_move(&info->shrinklist, &sbinfo->shrinklist);
+ sbinfo->shrinklist_len++;
+ spin_unlock(&sbinfo->shrinklist_lock);
+put:
iput(inode);
}
- spin_lock(&sbinfo->shrinklist_lock);
- list_splice_tail(&list, &sbinfo->shrinklist);
- sbinfo->shrinklist_len -= removed;
- spin_unlock(&sbinfo->shrinklist_lock);
-
return split;
}
--
2.20.1
The issue happened randomly in runtime. The message "Link is Down" is
popped but soon it recovered to "Link is Up".
The "Link is Down" results from the incorrect read data for reading the
PHY register via MDIO bus. The correct sequence for reading the data
shall be:
1. fire the command
2. wait for command done (this step was missing)
3. wait for data idle
4. read data from data register
Fixes: f160e99462c6 ("net: phy: Add mdio-aspeed")
Cc: stable(a)vger.kernel.org
Reviewed-by: Joel Stanley <joel(a)jms.id.au>
Signed-off-by: Dylan Hung <dylan_hung(a)aspeedtech.com>
---
v2: revise commit message
drivers/net/mdio/mdio-aspeed.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/drivers/net/mdio/mdio-aspeed.c b/drivers/net/mdio/mdio-aspeed.c
index cad820568f75..966c3b4ad59d 100644
--- a/drivers/net/mdio/mdio-aspeed.c
+++ b/drivers/net/mdio/mdio-aspeed.c
@@ -61,6 +61,13 @@ static int aspeed_mdio_read(struct mii_bus *bus, int addr, int regnum)
iowrite32(ctrl, ctx->base + ASPEED_MDIO_CTRL);
+ rc = readl_poll_timeout(ctx->base + ASPEED_MDIO_CTRL, ctrl,
+ !(ctrl & ASPEED_MDIO_CTRL_FIRE),
+ ASPEED_MDIO_INTERVAL_US,
+ ASPEED_MDIO_TIMEOUT_US);
+ if (rc < 0)
+ return rc;
+
rc = readl_poll_timeout(ctx->base + ASPEED_MDIO_DATA, data,
data & ASPEED_MDIO_DATA_IDLE,
ASPEED_MDIO_INTERVAL_US,
--
2.25.1
Hello Greg, Sasha,
this series for 4.14-stable backports all the fixes (and their
dependencies) for Armada 3720 PCIe driver.
These include:
- fixes (and their dependencies) for pci-aardvark controller
- fixes (and their dependencies) for pinctrl-armada-37xx driver
- device-tree fixes
Basically all fixes from upstream are taken, excluding those
that need fix the emulated bridge, since that was introduced
after 4.19. (Should we backport it? It concerns only mvebu and
aardvark controllers...)
Marek
Evan Wang (1):
PCI: aardvark: Remove PCIe outbound window configuration
Frederick Lawler (1):
PCI: Add PCI_EXP_LNKCTL2_TLS* macros
Gregory CLEMENT (1):
pinctrl: armada-37xx: add missing pin: PCIe1 Wakeup
Marek Behún (4):
PCI: aardvark: Improve link training
pinctrl: armada-37xx: Correct mpp definitions
pinctrl: armada-37xx: Correct PWM pins definitions
arm64: dts: marvell: armada-37xx: Set pcie_reset_pin to gpio function
Miquel Raynal (1):
arm64: dts: marvell: armada-37xx: declare PCIe reset pin
Pali Rohár (12):
PCI: aardvark: Train link immediately after enabling training
PCI: aardvark: Issue PERST via GPIO
PCI: aardvark: Replace custom macros by standard linux/pci_regs.h
macros
PCI: aardvark: Indicate error in 'val' when config read fails
PCI: aardvark: Don't touch PCIe registers if no card connected
PCI: aardvark: Fix compilation on s390
PCI: aardvark: Move PCIe reset card code to advk_pcie_train_link()
PCI: aardvark: Update comment about disabling link training
PCI: aardvark: Configure PCIe resources from 'ranges' DT property
PCI: aardvark: Fix PCIe Max Payload Size setting
PCI: aardvark: Fix link training
PCI: aardvark: Fix checking for link up via LTSSM state
Remi Pommarel (1):
PCI: aardvark: Wait for endpoint to be ready before training link
Sergei Shtylyov (1):
PCI: aardvark: Fix I/O space page leak
Thomas Petazzoni (1):
PCI: aardvark: Introduce an advk_pcie_valid_device() helper
Wen Yang (1):
PCI: aardvark: Fix a leaked reference by adding missing of_node_put()
.../pinctrl/marvell,armada-37xx-pinctrl.txt | 26 +-
.../arm64/boot/dts/marvell/armada-3720-db.dts | 3 +
.../dts/marvell/armada-3720-espressobin.dts | 3 +
arch/arm64/boot/dts/marvell/armada-37xx.dtsi | 9 +
drivers/pci/host/pci-aardvark.c | 463 ++++++++++++++----
drivers/pinctrl/mvebu/pinctrl-armada-37xx.c | 28 +-
include/uapi/linux/pci_regs.h | 5 +
7 files changed, 430 insertions(+), 107 deletions(-)
--
2.32.0
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From c1e63117711977cc4295b2ce73de29dd17066c82 Mon Sep 17 00:00:00 2001
From: David Hildenbrand <david(a)redhat.com>
Date: Fri, 19 Nov 2021 16:43:58 -0800
Subject: [PATCH] proc/vmcore: fix clearing user buffer by properly using
clear_user()
To clear a user buffer we cannot simply use memset, we have to use
clear_user(). With a virtio-mem device that registers a vmcore_cb and
has some logically unplugged memory inside an added Linux memory block,
I can easily trigger a BUG by copying the vmcore via "cp":
systemd[1]: Starting Kdump Vmcore Save Service...
kdump[420]: Kdump is using the default log level(3).
kdump[453]: saving to /sysroot/var/crash/127.0.0.1-2021-11-11-14:59:22/
kdump[458]: saving vmcore-dmesg.txt to /sysroot/var/crash/127.0.0.1-2021-11-11-14:59:22/
kdump[465]: saving vmcore-dmesg.txt complete
kdump[467]: saving vmcore
BUG: unable to handle page fault for address: 00007f2374e01000
#PF: supervisor write access in kernel mode
#PF: error_code(0x0003) - permissions violation
PGD 7a523067 P4D 7a523067 PUD 7a528067 PMD 7a525067 PTE 800000007048f867
Oops: 0003 [#1] PREEMPT SMP NOPTI
CPU: 0 PID: 468 Comm: cp Not tainted 5.15.0+ #6
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.14.0-27-g64f37cc530f1-prebuilt.qemu.org 04/01/2014
RIP: 0010:read_from_oldmem.part.0.cold+0x1d/0x86
Code: ff ff ff e8 05 ff fe ff e9 b9 e9 7f ff 48 89 de 48 c7 c7 38 3b 60 82 e8 f1 fe fe ff 83 fd 08 72 3c 49 8d 7d 08 4c 89 e9 89 e8 <49> c7 45 00 00 00 00 00 49 c7 44 05 f8 00 00 00 00 48 83 e7 f81
RSP: 0018:ffffc9000073be08 EFLAGS: 00010212
RAX: 0000000000001000 RBX: 00000000002fd000 RCX: 00007f2374e01000
RDX: 0000000000000001 RSI: 00000000ffffdfff RDI: 00007f2374e01008
RBP: 0000000000001000 R08: 0000000000000000 R09: ffffc9000073bc50
R10: ffffc9000073bc48 R11: ffffffff829461a8 R12: 000000000000f000
R13: 00007f2374e01000 R14: 0000000000000000 R15: ffff88807bd421e8
FS: 00007f2374e12140(0000) GS:ffff88807f000000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f2374e01000 CR3: 000000007a4aa000 CR4: 0000000000350eb0
Call Trace:
read_vmcore+0x236/0x2c0
proc_reg_read+0x55/0xa0
vfs_read+0x95/0x190
ksys_read+0x4f/0xc0
do_syscall_64+0x3b/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xae
Some x86-64 CPUs have a CPU feature called "Supervisor Mode Access
Prevention (SMAP)", which is used to detect wrong access from the kernel
to user buffers like this: SMAP triggers a permissions violation on
wrong access. In the x86-64 variant of clear_user(), SMAP is properly
handled via clac()+stac().
To fix, properly use clear_user() when we're dealing with a user buffer.
Link: https://lkml.kernel.org/r/20211112092750.6921-1-david@redhat.com
Fixes: 997c136f518c ("fs/proc/vmcore.c: add hook to read_from_oldmem() to check for non-ram pages")
Signed-off-by: David Hildenbrand <david(a)redhat.com>
Acked-by: Baoquan He <bhe(a)redhat.com>
Cc: Dave Young <dyoung(a)redhat.com>
Cc: Baoquan He <bhe(a)redhat.com>
Cc: Vivek Goyal <vgoyal(a)redhat.com>
Cc: Philipp Rudo <prudo(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index 30a3b66f475a..509f85148fee 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -154,9 +154,13 @@ ssize_t read_from_oldmem(char *buf, size_t count,
nr_bytes = count;
/* If pfn is not ram, return zeros for sparse dump files */
- if (!pfn_is_ram(pfn))
- memset(buf, 0, nr_bytes);
- else {
+ if (!pfn_is_ram(pfn)) {
+ tmp = 0;
+ if (!userbuf)
+ memset(buf, 0, nr_bytes);
+ else if (clear_user(buf, nr_bytes))
+ tmp = -EFAULT;
+ } else {
if (encrypted)
tmp = copy_oldmem_page_encrypted(pfn, buf,
nr_bytes,
@@ -165,12 +169,12 @@ ssize_t read_from_oldmem(char *buf, size_t count,
else
tmp = copy_oldmem_page(pfn, buf, nr_bytes,
offset, userbuf);
-
- if (tmp < 0) {
- up_read(&vmcore_cb_rwsem);
- return tmp;
- }
}
+ if (tmp < 0) {
+ up_read(&vmcore_cb_rwsem);
+ return tmp;
+ }
+
*ppos += nr_bytes;
count -= nr_bytes;
buf += nr_bytes;
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From c1e63117711977cc4295b2ce73de29dd17066c82 Mon Sep 17 00:00:00 2001
From: David Hildenbrand <david(a)redhat.com>
Date: Fri, 19 Nov 2021 16:43:58 -0800
Subject: [PATCH] proc/vmcore: fix clearing user buffer by properly using
clear_user()
To clear a user buffer we cannot simply use memset, we have to use
clear_user(). With a virtio-mem device that registers a vmcore_cb and
has some logically unplugged memory inside an added Linux memory block,
I can easily trigger a BUG by copying the vmcore via "cp":
systemd[1]: Starting Kdump Vmcore Save Service...
kdump[420]: Kdump is using the default log level(3).
kdump[453]: saving to /sysroot/var/crash/127.0.0.1-2021-11-11-14:59:22/
kdump[458]: saving vmcore-dmesg.txt to /sysroot/var/crash/127.0.0.1-2021-11-11-14:59:22/
kdump[465]: saving vmcore-dmesg.txt complete
kdump[467]: saving vmcore
BUG: unable to handle page fault for address: 00007f2374e01000
#PF: supervisor write access in kernel mode
#PF: error_code(0x0003) - permissions violation
PGD 7a523067 P4D 7a523067 PUD 7a528067 PMD 7a525067 PTE 800000007048f867
Oops: 0003 [#1] PREEMPT SMP NOPTI
CPU: 0 PID: 468 Comm: cp Not tainted 5.15.0+ #6
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.14.0-27-g64f37cc530f1-prebuilt.qemu.org 04/01/2014
RIP: 0010:read_from_oldmem.part.0.cold+0x1d/0x86
Code: ff ff ff e8 05 ff fe ff e9 b9 e9 7f ff 48 89 de 48 c7 c7 38 3b 60 82 e8 f1 fe fe ff 83 fd 08 72 3c 49 8d 7d 08 4c 89 e9 89 e8 <49> c7 45 00 00 00 00 00 49 c7 44 05 f8 00 00 00 00 48 83 e7 f81
RSP: 0018:ffffc9000073be08 EFLAGS: 00010212
RAX: 0000000000001000 RBX: 00000000002fd000 RCX: 00007f2374e01000
RDX: 0000000000000001 RSI: 00000000ffffdfff RDI: 00007f2374e01008
RBP: 0000000000001000 R08: 0000000000000000 R09: ffffc9000073bc50
R10: ffffc9000073bc48 R11: ffffffff829461a8 R12: 000000000000f000
R13: 00007f2374e01000 R14: 0000000000000000 R15: ffff88807bd421e8
FS: 00007f2374e12140(0000) GS:ffff88807f000000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f2374e01000 CR3: 000000007a4aa000 CR4: 0000000000350eb0
Call Trace:
read_vmcore+0x236/0x2c0
proc_reg_read+0x55/0xa0
vfs_read+0x95/0x190
ksys_read+0x4f/0xc0
do_syscall_64+0x3b/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xae
Some x86-64 CPUs have a CPU feature called "Supervisor Mode Access
Prevention (SMAP)", which is used to detect wrong access from the kernel
to user buffers like this: SMAP triggers a permissions violation on
wrong access. In the x86-64 variant of clear_user(), SMAP is properly
handled via clac()+stac().
To fix, properly use clear_user() when we're dealing with a user buffer.
Link: https://lkml.kernel.org/r/20211112092750.6921-1-david@redhat.com
Fixes: 997c136f518c ("fs/proc/vmcore.c: add hook to read_from_oldmem() to check for non-ram pages")
Signed-off-by: David Hildenbrand <david(a)redhat.com>
Acked-by: Baoquan He <bhe(a)redhat.com>
Cc: Dave Young <dyoung(a)redhat.com>
Cc: Baoquan He <bhe(a)redhat.com>
Cc: Vivek Goyal <vgoyal(a)redhat.com>
Cc: Philipp Rudo <prudo(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index 30a3b66f475a..509f85148fee 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -154,9 +154,13 @@ ssize_t read_from_oldmem(char *buf, size_t count,
nr_bytes = count;
/* If pfn is not ram, return zeros for sparse dump files */
- if (!pfn_is_ram(pfn))
- memset(buf, 0, nr_bytes);
- else {
+ if (!pfn_is_ram(pfn)) {
+ tmp = 0;
+ if (!userbuf)
+ memset(buf, 0, nr_bytes);
+ else if (clear_user(buf, nr_bytes))
+ tmp = -EFAULT;
+ } else {
if (encrypted)
tmp = copy_oldmem_page_encrypted(pfn, buf,
nr_bytes,
@@ -165,12 +169,12 @@ ssize_t read_from_oldmem(char *buf, size_t count,
else
tmp = copy_oldmem_page(pfn, buf, nr_bytes,
offset, userbuf);
-
- if (tmp < 0) {
- up_read(&vmcore_cb_rwsem);
- return tmp;
- }
}
+ if (tmp < 0) {
+ up_read(&vmcore_cb_rwsem);
+ return tmp;
+ }
+
*ppos += nr_bytes;
count -= nr_bytes;
buf += nr_bytes;
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From c1e63117711977cc4295b2ce73de29dd17066c82 Mon Sep 17 00:00:00 2001
From: David Hildenbrand <david(a)redhat.com>
Date: Fri, 19 Nov 2021 16:43:58 -0800
Subject: [PATCH] proc/vmcore: fix clearing user buffer by properly using
clear_user()
To clear a user buffer we cannot simply use memset, we have to use
clear_user(). With a virtio-mem device that registers a vmcore_cb and
has some logically unplugged memory inside an added Linux memory block,
I can easily trigger a BUG by copying the vmcore via "cp":
systemd[1]: Starting Kdump Vmcore Save Service...
kdump[420]: Kdump is using the default log level(3).
kdump[453]: saving to /sysroot/var/crash/127.0.0.1-2021-11-11-14:59:22/
kdump[458]: saving vmcore-dmesg.txt to /sysroot/var/crash/127.0.0.1-2021-11-11-14:59:22/
kdump[465]: saving vmcore-dmesg.txt complete
kdump[467]: saving vmcore
BUG: unable to handle page fault for address: 00007f2374e01000
#PF: supervisor write access in kernel mode
#PF: error_code(0x0003) - permissions violation
PGD 7a523067 P4D 7a523067 PUD 7a528067 PMD 7a525067 PTE 800000007048f867
Oops: 0003 [#1] PREEMPT SMP NOPTI
CPU: 0 PID: 468 Comm: cp Not tainted 5.15.0+ #6
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.14.0-27-g64f37cc530f1-prebuilt.qemu.org 04/01/2014
RIP: 0010:read_from_oldmem.part.0.cold+0x1d/0x86
Code: ff ff ff e8 05 ff fe ff e9 b9 e9 7f ff 48 89 de 48 c7 c7 38 3b 60 82 e8 f1 fe fe ff 83 fd 08 72 3c 49 8d 7d 08 4c 89 e9 89 e8 <49> c7 45 00 00 00 00 00 49 c7 44 05 f8 00 00 00 00 48 83 e7 f81
RSP: 0018:ffffc9000073be08 EFLAGS: 00010212
RAX: 0000000000001000 RBX: 00000000002fd000 RCX: 00007f2374e01000
RDX: 0000000000000001 RSI: 00000000ffffdfff RDI: 00007f2374e01008
RBP: 0000000000001000 R08: 0000000000000000 R09: ffffc9000073bc50
R10: ffffc9000073bc48 R11: ffffffff829461a8 R12: 000000000000f000
R13: 00007f2374e01000 R14: 0000000000000000 R15: ffff88807bd421e8
FS: 00007f2374e12140(0000) GS:ffff88807f000000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f2374e01000 CR3: 000000007a4aa000 CR4: 0000000000350eb0
Call Trace:
read_vmcore+0x236/0x2c0
proc_reg_read+0x55/0xa0
vfs_read+0x95/0x190
ksys_read+0x4f/0xc0
do_syscall_64+0x3b/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xae
Some x86-64 CPUs have a CPU feature called "Supervisor Mode Access
Prevention (SMAP)", which is used to detect wrong access from the kernel
to user buffers like this: SMAP triggers a permissions violation on
wrong access. In the x86-64 variant of clear_user(), SMAP is properly
handled via clac()+stac().
To fix, properly use clear_user() when we're dealing with a user buffer.
Link: https://lkml.kernel.org/r/20211112092750.6921-1-david@redhat.com
Fixes: 997c136f518c ("fs/proc/vmcore.c: add hook to read_from_oldmem() to check for non-ram pages")
Signed-off-by: David Hildenbrand <david(a)redhat.com>
Acked-by: Baoquan He <bhe(a)redhat.com>
Cc: Dave Young <dyoung(a)redhat.com>
Cc: Baoquan He <bhe(a)redhat.com>
Cc: Vivek Goyal <vgoyal(a)redhat.com>
Cc: Philipp Rudo <prudo(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index 30a3b66f475a..509f85148fee 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -154,9 +154,13 @@ ssize_t read_from_oldmem(char *buf, size_t count,
nr_bytes = count;
/* If pfn is not ram, return zeros for sparse dump files */
- if (!pfn_is_ram(pfn))
- memset(buf, 0, nr_bytes);
- else {
+ if (!pfn_is_ram(pfn)) {
+ tmp = 0;
+ if (!userbuf)
+ memset(buf, 0, nr_bytes);
+ else if (clear_user(buf, nr_bytes))
+ tmp = -EFAULT;
+ } else {
if (encrypted)
tmp = copy_oldmem_page_encrypted(pfn, buf,
nr_bytes,
@@ -165,12 +169,12 @@ ssize_t read_from_oldmem(char *buf, size_t count,
else
tmp = copy_oldmem_page(pfn, buf, nr_bytes,
offset, userbuf);
-
- if (tmp < 0) {
- up_read(&vmcore_cb_rwsem);
- return tmp;
- }
}
+ if (tmp < 0) {
+ up_read(&vmcore_cb_rwsem);
+ return tmp;
+ }
+
*ppos += nr_bytes;
count -= nr_bytes;
buf += nr_bytes;
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From c1e63117711977cc4295b2ce73de29dd17066c82 Mon Sep 17 00:00:00 2001
From: David Hildenbrand <david(a)redhat.com>
Date: Fri, 19 Nov 2021 16:43:58 -0800
Subject: [PATCH] proc/vmcore: fix clearing user buffer by properly using
clear_user()
To clear a user buffer we cannot simply use memset, we have to use
clear_user(). With a virtio-mem device that registers a vmcore_cb and
has some logically unplugged memory inside an added Linux memory block,
I can easily trigger a BUG by copying the vmcore via "cp":
systemd[1]: Starting Kdump Vmcore Save Service...
kdump[420]: Kdump is using the default log level(3).
kdump[453]: saving to /sysroot/var/crash/127.0.0.1-2021-11-11-14:59:22/
kdump[458]: saving vmcore-dmesg.txt to /sysroot/var/crash/127.0.0.1-2021-11-11-14:59:22/
kdump[465]: saving vmcore-dmesg.txt complete
kdump[467]: saving vmcore
BUG: unable to handle page fault for address: 00007f2374e01000
#PF: supervisor write access in kernel mode
#PF: error_code(0x0003) - permissions violation
PGD 7a523067 P4D 7a523067 PUD 7a528067 PMD 7a525067 PTE 800000007048f867
Oops: 0003 [#1] PREEMPT SMP NOPTI
CPU: 0 PID: 468 Comm: cp Not tainted 5.15.0+ #6
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.14.0-27-g64f37cc530f1-prebuilt.qemu.org 04/01/2014
RIP: 0010:read_from_oldmem.part.0.cold+0x1d/0x86
Code: ff ff ff e8 05 ff fe ff e9 b9 e9 7f ff 48 89 de 48 c7 c7 38 3b 60 82 e8 f1 fe fe ff 83 fd 08 72 3c 49 8d 7d 08 4c 89 e9 89 e8 <49> c7 45 00 00 00 00 00 49 c7 44 05 f8 00 00 00 00 48 83 e7 f81
RSP: 0018:ffffc9000073be08 EFLAGS: 00010212
RAX: 0000000000001000 RBX: 00000000002fd000 RCX: 00007f2374e01000
RDX: 0000000000000001 RSI: 00000000ffffdfff RDI: 00007f2374e01008
RBP: 0000000000001000 R08: 0000000000000000 R09: ffffc9000073bc50
R10: ffffc9000073bc48 R11: ffffffff829461a8 R12: 000000000000f000
R13: 00007f2374e01000 R14: 0000000000000000 R15: ffff88807bd421e8
FS: 00007f2374e12140(0000) GS:ffff88807f000000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f2374e01000 CR3: 000000007a4aa000 CR4: 0000000000350eb0
Call Trace:
read_vmcore+0x236/0x2c0
proc_reg_read+0x55/0xa0
vfs_read+0x95/0x190
ksys_read+0x4f/0xc0
do_syscall_64+0x3b/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xae
Some x86-64 CPUs have a CPU feature called "Supervisor Mode Access
Prevention (SMAP)", which is used to detect wrong access from the kernel
to user buffers like this: SMAP triggers a permissions violation on
wrong access. In the x86-64 variant of clear_user(), SMAP is properly
handled via clac()+stac().
To fix, properly use clear_user() when we're dealing with a user buffer.
Link: https://lkml.kernel.org/r/20211112092750.6921-1-david@redhat.com
Fixes: 997c136f518c ("fs/proc/vmcore.c: add hook to read_from_oldmem() to check for non-ram pages")
Signed-off-by: David Hildenbrand <david(a)redhat.com>
Acked-by: Baoquan He <bhe(a)redhat.com>
Cc: Dave Young <dyoung(a)redhat.com>
Cc: Baoquan He <bhe(a)redhat.com>
Cc: Vivek Goyal <vgoyal(a)redhat.com>
Cc: Philipp Rudo <prudo(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index 30a3b66f475a..509f85148fee 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -154,9 +154,13 @@ ssize_t read_from_oldmem(char *buf, size_t count,
nr_bytes = count;
/* If pfn is not ram, return zeros for sparse dump files */
- if (!pfn_is_ram(pfn))
- memset(buf, 0, nr_bytes);
- else {
+ if (!pfn_is_ram(pfn)) {
+ tmp = 0;
+ if (!userbuf)
+ memset(buf, 0, nr_bytes);
+ else if (clear_user(buf, nr_bytes))
+ tmp = -EFAULT;
+ } else {
if (encrypted)
tmp = copy_oldmem_page_encrypted(pfn, buf,
nr_bytes,
@@ -165,12 +169,12 @@ ssize_t read_from_oldmem(char *buf, size_t count,
else
tmp = copy_oldmem_page(pfn, buf, nr_bytes,
offset, userbuf);
-
- if (tmp < 0) {
- up_read(&vmcore_cb_rwsem);
- return tmp;
- }
}
+ if (tmp < 0) {
+ up_read(&vmcore_cb_rwsem);
+ return tmp;
+ }
+
*ppos += nr_bytes;
count -= nr_bytes;
buf += nr_bytes;
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From c1e63117711977cc4295b2ce73de29dd17066c82 Mon Sep 17 00:00:00 2001
From: David Hildenbrand <david(a)redhat.com>
Date: Fri, 19 Nov 2021 16:43:58 -0800
Subject: [PATCH] proc/vmcore: fix clearing user buffer by properly using
clear_user()
To clear a user buffer we cannot simply use memset, we have to use
clear_user(). With a virtio-mem device that registers a vmcore_cb and
has some logically unplugged memory inside an added Linux memory block,
I can easily trigger a BUG by copying the vmcore via "cp":
systemd[1]: Starting Kdump Vmcore Save Service...
kdump[420]: Kdump is using the default log level(3).
kdump[453]: saving to /sysroot/var/crash/127.0.0.1-2021-11-11-14:59:22/
kdump[458]: saving vmcore-dmesg.txt to /sysroot/var/crash/127.0.0.1-2021-11-11-14:59:22/
kdump[465]: saving vmcore-dmesg.txt complete
kdump[467]: saving vmcore
BUG: unable to handle page fault for address: 00007f2374e01000
#PF: supervisor write access in kernel mode
#PF: error_code(0x0003) - permissions violation
PGD 7a523067 P4D 7a523067 PUD 7a528067 PMD 7a525067 PTE 800000007048f867
Oops: 0003 [#1] PREEMPT SMP NOPTI
CPU: 0 PID: 468 Comm: cp Not tainted 5.15.0+ #6
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.14.0-27-g64f37cc530f1-prebuilt.qemu.org 04/01/2014
RIP: 0010:read_from_oldmem.part.0.cold+0x1d/0x86
Code: ff ff ff e8 05 ff fe ff e9 b9 e9 7f ff 48 89 de 48 c7 c7 38 3b 60 82 e8 f1 fe fe ff 83 fd 08 72 3c 49 8d 7d 08 4c 89 e9 89 e8 <49> c7 45 00 00 00 00 00 49 c7 44 05 f8 00 00 00 00 48 83 e7 f81
RSP: 0018:ffffc9000073be08 EFLAGS: 00010212
RAX: 0000000000001000 RBX: 00000000002fd000 RCX: 00007f2374e01000
RDX: 0000000000000001 RSI: 00000000ffffdfff RDI: 00007f2374e01008
RBP: 0000000000001000 R08: 0000000000000000 R09: ffffc9000073bc50
R10: ffffc9000073bc48 R11: ffffffff829461a8 R12: 000000000000f000
R13: 00007f2374e01000 R14: 0000000000000000 R15: ffff88807bd421e8
FS: 00007f2374e12140(0000) GS:ffff88807f000000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f2374e01000 CR3: 000000007a4aa000 CR4: 0000000000350eb0
Call Trace:
read_vmcore+0x236/0x2c0
proc_reg_read+0x55/0xa0
vfs_read+0x95/0x190
ksys_read+0x4f/0xc0
do_syscall_64+0x3b/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xae
Some x86-64 CPUs have a CPU feature called "Supervisor Mode Access
Prevention (SMAP)", which is used to detect wrong access from the kernel
to user buffers like this: SMAP triggers a permissions violation on
wrong access. In the x86-64 variant of clear_user(), SMAP is properly
handled via clac()+stac().
To fix, properly use clear_user() when we're dealing with a user buffer.
Link: https://lkml.kernel.org/r/20211112092750.6921-1-david@redhat.com
Fixes: 997c136f518c ("fs/proc/vmcore.c: add hook to read_from_oldmem() to check for non-ram pages")
Signed-off-by: David Hildenbrand <david(a)redhat.com>
Acked-by: Baoquan He <bhe(a)redhat.com>
Cc: Dave Young <dyoung(a)redhat.com>
Cc: Baoquan He <bhe(a)redhat.com>
Cc: Vivek Goyal <vgoyal(a)redhat.com>
Cc: Philipp Rudo <prudo(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index 30a3b66f475a..509f85148fee 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -154,9 +154,13 @@ ssize_t read_from_oldmem(char *buf, size_t count,
nr_bytes = count;
/* If pfn is not ram, return zeros for sparse dump files */
- if (!pfn_is_ram(pfn))
- memset(buf, 0, nr_bytes);
- else {
+ if (!pfn_is_ram(pfn)) {
+ tmp = 0;
+ if (!userbuf)
+ memset(buf, 0, nr_bytes);
+ else if (clear_user(buf, nr_bytes))
+ tmp = -EFAULT;
+ } else {
if (encrypted)
tmp = copy_oldmem_page_encrypted(pfn, buf,
nr_bytes,
@@ -165,12 +169,12 @@ ssize_t read_from_oldmem(char *buf, size_t count,
else
tmp = copy_oldmem_page(pfn, buf, nr_bytes,
offset, userbuf);
-
- if (tmp < 0) {
- up_read(&vmcore_cb_rwsem);
- return tmp;
- }
}
+ if (tmp < 0) {
+ up_read(&vmcore_cb_rwsem);
+ return tmp;
+ }
+
*ppos += nr_bytes;
count -= nr_bytes;
buf += nr_bytes;
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From c1e63117711977cc4295b2ce73de29dd17066c82 Mon Sep 17 00:00:00 2001
From: David Hildenbrand <david(a)redhat.com>
Date: Fri, 19 Nov 2021 16:43:58 -0800
Subject: [PATCH] proc/vmcore: fix clearing user buffer by properly using
clear_user()
To clear a user buffer we cannot simply use memset, we have to use
clear_user(). With a virtio-mem device that registers a vmcore_cb and
has some logically unplugged memory inside an added Linux memory block,
I can easily trigger a BUG by copying the vmcore via "cp":
systemd[1]: Starting Kdump Vmcore Save Service...
kdump[420]: Kdump is using the default log level(3).
kdump[453]: saving to /sysroot/var/crash/127.0.0.1-2021-11-11-14:59:22/
kdump[458]: saving vmcore-dmesg.txt to /sysroot/var/crash/127.0.0.1-2021-11-11-14:59:22/
kdump[465]: saving vmcore-dmesg.txt complete
kdump[467]: saving vmcore
BUG: unable to handle page fault for address: 00007f2374e01000
#PF: supervisor write access in kernel mode
#PF: error_code(0x0003) - permissions violation
PGD 7a523067 P4D 7a523067 PUD 7a528067 PMD 7a525067 PTE 800000007048f867
Oops: 0003 [#1] PREEMPT SMP NOPTI
CPU: 0 PID: 468 Comm: cp Not tainted 5.15.0+ #6
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.14.0-27-g64f37cc530f1-prebuilt.qemu.org 04/01/2014
RIP: 0010:read_from_oldmem.part.0.cold+0x1d/0x86
Code: ff ff ff e8 05 ff fe ff e9 b9 e9 7f ff 48 89 de 48 c7 c7 38 3b 60 82 e8 f1 fe fe ff 83 fd 08 72 3c 49 8d 7d 08 4c 89 e9 89 e8 <49> c7 45 00 00 00 00 00 49 c7 44 05 f8 00 00 00 00 48 83 e7 f81
RSP: 0018:ffffc9000073be08 EFLAGS: 00010212
RAX: 0000000000001000 RBX: 00000000002fd000 RCX: 00007f2374e01000
RDX: 0000000000000001 RSI: 00000000ffffdfff RDI: 00007f2374e01008
RBP: 0000000000001000 R08: 0000000000000000 R09: ffffc9000073bc50
R10: ffffc9000073bc48 R11: ffffffff829461a8 R12: 000000000000f000
R13: 00007f2374e01000 R14: 0000000000000000 R15: ffff88807bd421e8
FS: 00007f2374e12140(0000) GS:ffff88807f000000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f2374e01000 CR3: 000000007a4aa000 CR4: 0000000000350eb0
Call Trace:
read_vmcore+0x236/0x2c0
proc_reg_read+0x55/0xa0
vfs_read+0x95/0x190
ksys_read+0x4f/0xc0
do_syscall_64+0x3b/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xae
Some x86-64 CPUs have a CPU feature called "Supervisor Mode Access
Prevention (SMAP)", which is used to detect wrong access from the kernel
to user buffers like this: SMAP triggers a permissions violation on
wrong access. In the x86-64 variant of clear_user(), SMAP is properly
handled via clac()+stac().
To fix, properly use clear_user() when we're dealing with a user buffer.
Link: https://lkml.kernel.org/r/20211112092750.6921-1-david@redhat.com
Fixes: 997c136f518c ("fs/proc/vmcore.c: add hook to read_from_oldmem() to check for non-ram pages")
Signed-off-by: David Hildenbrand <david(a)redhat.com>
Acked-by: Baoquan He <bhe(a)redhat.com>
Cc: Dave Young <dyoung(a)redhat.com>
Cc: Baoquan He <bhe(a)redhat.com>
Cc: Vivek Goyal <vgoyal(a)redhat.com>
Cc: Philipp Rudo <prudo(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index 30a3b66f475a..509f85148fee 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -154,9 +154,13 @@ ssize_t read_from_oldmem(char *buf, size_t count,
nr_bytes = count;
/* If pfn is not ram, return zeros for sparse dump files */
- if (!pfn_is_ram(pfn))
- memset(buf, 0, nr_bytes);
- else {
+ if (!pfn_is_ram(pfn)) {
+ tmp = 0;
+ if (!userbuf)
+ memset(buf, 0, nr_bytes);
+ else if (clear_user(buf, nr_bytes))
+ tmp = -EFAULT;
+ } else {
if (encrypted)
tmp = copy_oldmem_page_encrypted(pfn, buf,
nr_bytes,
@@ -165,12 +169,12 @@ ssize_t read_from_oldmem(char *buf, size_t count,
else
tmp = copy_oldmem_page(pfn, buf, nr_bytes,
offset, userbuf);
-
- if (tmp < 0) {
- up_read(&vmcore_cb_rwsem);
- return tmp;
- }
}
+ if (tmp < 0) {
+ up_read(&vmcore_cb_rwsem);
+ return tmp;
+ }
+
*ppos += nr_bytes;
count -= nr_bytes;
buf += nr_bytes;
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From c1e63117711977cc4295b2ce73de29dd17066c82 Mon Sep 17 00:00:00 2001
From: David Hildenbrand <david(a)redhat.com>
Date: Fri, 19 Nov 2021 16:43:58 -0800
Subject: [PATCH] proc/vmcore: fix clearing user buffer by properly using
clear_user()
To clear a user buffer we cannot simply use memset, we have to use
clear_user(). With a virtio-mem device that registers a vmcore_cb and
has some logically unplugged memory inside an added Linux memory block,
I can easily trigger a BUG by copying the vmcore via "cp":
systemd[1]: Starting Kdump Vmcore Save Service...
kdump[420]: Kdump is using the default log level(3).
kdump[453]: saving to /sysroot/var/crash/127.0.0.1-2021-11-11-14:59:22/
kdump[458]: saving vmcore-dmesg.txt to /sysroot/var/crash/127.0.0.1-2021-11-11-14:59:22/
kdump[465]: saving vmcore-dmesg.txt complete
kdump[467]: saving vmcore
BUG: unable to handle page fault for address: 00007f2374e01000
#PF: supervisor write access in kernel mode
#PF: error_code(0x0003) - permissions violation
PGD 7a523067 P4D 7a523067 PUD 7a528067 PMD 7a525067 PTE 800000007048f867
Oops: 0003 [#1] PREEMPT SMP NOPTI
CPU: 0 PID: 468 Comm: cp Not tainted 5.15.0+ #6
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.14.0-27-g64f37cc530f1-prebuilt.qemu.org 04/01/2014
RIP: 0010:read_from_oldmem.part.0.cold+0x1d/0x86
Code: ff ff ff e8 05 ff fe ff e9 b9 e9 7f ff 48 89 de 48 c7 c7 38 3b 60 82 e8 f1 fe fe ff 83 fd 08 72 3c 49 8d 7d 08 4c 89 e9 89 e8 <49> c7 45 00 00 00 00 00 49 c7 44 05 f8 00 00 00 00 48 83 e7 f81
RSP: 0018:ffffc9000073be08 EFLAGS: 00010212
RAX: 0000000000001000 RBX: 00000000002fd000 RCX: 00007f2374e01000
RDX: 0000000000000001 RSI: 00000000ffffdfff RDI: 00007f2374e01008
RBP: 0000000000001000 R08: 0000000000000000 R09: ffffc9000073bc50
R10: ffffc9000073bc48 R11: ffffffff829461a8 R12: 000000000000f000
R13: 00007f2374e01000 R14: 0000000000000000 R15: ffff88807bd421e8
FS: 00007f2374e12140(0000) GS:ffff88807f000000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f2374e01000 CR3: 000000007a4aa000 CR4: 0000000000350eb0
Call Trace:
read_vmcore+0x236/0x2c0
proc_reg_read+0x55/0xa0
vfs_read+0x95/0x190
ksys_read+0x4f/0xc0
do_syscall_64+0x3b/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xae
Some x86-64 CPUs have a CPU feature called "Supervisor Mode Access
Prevention (SMAP)", which is used to detect wrong access from the kernel
to user buffers like this: SMAP triggers a permissions violation on
wrong access. In the x86-64 variant of clear_user(), SMAP is properly
handled via clac()+stac().
To fix, properly use clear_user() when we're dealing with a user buffer.
Link: https://lkml.kernel.org/r/20211112092750.6921-1-david@redhat.com
Fixes: 997c136f518c ("fs/proc/vmcore.c: add hook to read_from_oldmem() to check for non-ram pages")
Signed-off-by: David Hildenbrand <david(a)redhat.com>
Acked-by: Baoquan He <bhe(a)redhat.com>
Cc: Dave Young <dyoung(a)redhat.com>
Cc: Baoquan He <bhe(a)redhat.com>
Cc: Vivek Goyal <vgoyal(a)redhat.com>
Cc: Philipp Rudo <prudo(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index 30a3b66f475a..509f85148fee 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -154,9 +154,13 @@ ssize_t read_from_oldmem(char *buf, size_t count,
nr_bytes = count;
/* If pfn is not ram, return zeros for sparse dump files */
- if (!pfn_is_ram(pfn))
- memset(buf, 0, nr_bytes);
- else {
+ if (!pfn_is_ram(pfn)) {
+ tmp = 0;
+ if (!userbuf)
+ memset(buf, 0, nr_bytes);
+ else if (clear_user(buf, nr_bytes))
+ tmp = -EFAULT;
+ } else {
if (encrypted)
tmp = copy_oldmem_page_encrypted(pfn, buf,
nr_bytes,
@@ -165,12 +169,12 @@ ssize_t read_from_oldmem(char *buf, size_t count,
else
tmp = copy_oldmem_page(pfn, buf, nr_bytes,
offset, userbuf);
-
- if (tmp < 0) {
- up_read(&vmcore_cb_rwsem);
- return tmp;
- }
}
+ if (tmp < 0) {
+ up_read(&vmcore_cb_rwsem);
+ return tmp;
+ }
+
*ppos += nr_bytes;
count -= nr_bytes;
buf += nr_bytes;
From: Lai Jiangshan <laijs(a)linux.alibaba.com>
[ Upstream commit e45e9e3998f0001079b09555db5bb3b4257f6746 ]
The KVM doesn't know whether any TLB for a specific pcid is cached in
the CPU when tdp is enabled. So it is better to flush all the guest
TLB when invalidating any single PCID context.
The case is very rare or even impossible since KVM generally doesn't
intercept CR3 write or INVPCID instructions when tdp is enabled, so the
fix is mostly for the sake of overall robustness.
Signed-off-by: Lai Jiangshan <laijs(a)linux.alibaba.com>
Message-Id: <20211019110154.4091-2-jiangshanlai(a)gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
arch/x86/kvm/x86.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c48e2b5729c5d..0644f429f848c 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1091,6 +1091,18 @@ static void kvm_invalidate_pcid(struct kvm_vcpu *vcpu, unsigned long pcid)
unsigned long roots_to_free = 0;
int i;
+ /*
+ * MOV CR3 and INVPCID are usually not intercepted when using TDP, but
+ * this is reachable when running EPT=1 and unrestricted_guest=0, and
+ * also via the emulator. KVM's TDP page tables are not in the scope of
+ * the invalidation, but the guest's TLB entries need to be flushed as
+ * the CPU may have cached entries in its TLB for the target PCID.
+ */
+ if (unlikely(tdp_enabled)) {
+ kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu);
+ return;
+ }
+
/*
* If neither the current CR3 nor any of the prev_roots use the given
* PCID, then nothing needs to be done here because a resync will
--
2.33.0
It looks like the current work in progress branch for 5.15.5-rc1
picked up fe1bda72c1e8 "ASoC: SOF: Intel: hda-dai: fix potential
locking issue" without also picking 868ddfcef31f "ALSA: hda:
hdac_ext_stream: fix potential locking issues" from master. This
causes builds to fail on x86_64 using clang:
...
CC [M] drivers/misc/eeprom/at25.o
CC [M] drivers/block/pktcdvd.o
LTO [M] sound/soc/codecs/snd-soc-max98088.lto.o
sound/soc/sof/intel/hda-dai.c:111:4: error: implicit declaration of
function 'snd_hdac_ext_stream_decouple_locked'
[-Werror,-Wimplicit-function-declaration]
snd_hdac_ext_stream_decouple_locked(bus, res, true);
^
sound/soc/sof/intel/hda-dai.c:111:4: note: did you mean
'snd_hdac_ext_stream_decouple'?
./include/sound/hdaudio_ext.h:91:6: note:
'snd_hdac_ext_stream_decouple' declared here
void snd_hdac_ext_stream_decouple(struct hdac_bus *bus,
^
1 error generated.
make[4]: *** [scripts/Makefile.build:277: sound/soc/sof/intel/hda-dai.o] Error 1
make[3]: *** [scripts/Makefile.build:540: sound/soc/sof/intel] Error 2
make[2]: *** [scripts/Makefile.build:540: sound/soc/sof] Error 2
make[2]: *** Waiting for unfinished jobs....
AR drivers/misc/cb710/built-in.a
CC [M] drivers/misc/cb710/core.o
CC [M] net/netfilter/xt_nfacct.o
...
After cherry-picking the second commit (868ddfcef31f) builds complete normally.
In case it's relevant there's a third commit in this same series
that's also not included: 1465d06a6d85, "ALSA: hda: hdac_stream: fix
potential locking issue in snd_hdac_stream_assign()".
Scott
From: Roman Li <Roman.Li(a)amd.com>
[Why]
After commit ("drm/amdgpu/display: add support for multiple backlights")
number of eDPs is defined while registering backlight device.
However the panel's extended caps get updated once before register call.
That leads to regression with extended caps like oled brightness control.
[How]
Update connector ext caps after register_backlight_device
Fixes: 7fd13baeb7a3a4 ("drm/amdgpu/display: add support for multiple backlights")
Link: https://www.reddit.com/r/AMDLaptops/comments/qst0fm/after_updating_to_linux…
Signed-off-by: Roman Li <Roman.Li(a)amd.com>
Tested-by: Samuel Čavoj <samuel(a)cavoj.net>
Acked-by: Alex Deucher <alexander.deucher(a)amd.com>
Reviewed-by: Jasdeep Dhillon <Jasdeep.Dhillon(a)amd.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
Cc: stable(a)vger.kernel.org
(cherry picked from commit dab60582685aabdae2d4ff7ce716456bd0dc7a0f)
---
drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index 1ea31dcc7a8b..a70c57e73cce 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -3839,6 +3839,9 @@ static int amdgpu_dm_initialize_drm_device(struct amdgpu_device *adev)
} else if (dc_link_detect(link, DETECT_REASON_BOOT)) {
amdgpu_dm_update_connector_after_detect(aconnector);
register_backlight_device(dm, link);
+
+ if (dm->num_of_edps)
+ update_connector_ext_caps(aconnector);
if (amdgpu_dc_feature_mask & DC_PSR_MASK)
amdgpu_dm_set_psr_caps(link);
}
--
2.31.1
On Wed, Nov 24, 2021 at 02:56:17PM +0200, Sohaib Mohamed wrote:
> Hello, Greg
>
> On Wed, Nov 24, 2021, 2:45 PM Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
> wrote:
>
> > From: Sohaib Mohamed <sohaib.amhmd(a)gmail.com>
> >
> > [ Upstream commit 92723ea0f11d92496687db8c9725248e9d1e5e1d ]
> >
>
> Please, remove this patch from the queue.
> This patch has a problem and should be reverted.
Dropped from all branches now, thanks.
greg k-h
Regression found on arm64 gcc-11 builds
Following build warnings / errors reported on stable-rc 5.10.
metadata:
git_describe: v5.10.81-156-g7f9de6888cf8
git_repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
git_short_log: 7f9de6888cf8 (\"Linux 5.10.82-rc1\")
target_arch: arm64
toolchain: gcc-11
build error :
--------------
make --silent --keep-going --jobs=8
O=/home/tuxbuild/.cache/tuxmake/builds/current ARCH=arm64
CROSS_COMPILE=aarch64-linux-gnu- 'CC=sccache aarch64-linux-gnu-gcc'
'HOSTCC=sccache gcc'
drivers/net/ipa/ipa_endpoint.c: In function
'ipa_endpoint_init_hol_block_enable':
drivers/net/ipa/ipa_endpoint.c:723:49: error: 'IPA_VERSION_4_5'
undeclared (first use in this function); did you mean
'IPA_VERSION_4_2'?
723 | if (enable && endpoint->ipa->version >= IPA_VERSION_4_5)
| ^~~~~~~~~~~~~~~
| IPA_VERSION_4_2
drivers/net/ipa/ipa_endpoint.c:723:49: note: each undeclared
identifier is reported only once for each function it appears in
make[4]: *** [scripts/Makefile.build:280:
drivers/net/ipa/ipa_endpoint.o] Error 1
make[4]: Target '__build' not remade because of errors.
make[3]: *** [scripts/Makefile.build:497: drivers/net/ipa] Error 2
Reported-by: Linux Kernel Functional Testing <lkft(a)linaro.org>
build link:
-----------
https://builds.tuxbuild.com//21MNrWsMVCZZPGZPpgMBMMYx2Kb/build.log
build config:
-------------
https://builds.tuxbuild.com//21MNrWsMVCZZPGZPpgMBMMYx2Kb/config
# To install tuxmake on your system globally
# sudo pip3 install -U tuxmake
tuxmake --runtime podman --target-arch arm64 --toolchain gcc-11
--kconfig defconfig \
--kconfig-add
https://builds.tuxbuild.com//21MNrWsMVCZZPGZPpgMBMMYx2Kb/config
--
Linaro LKFT
https://lkft.linaro.org
The patch titled
Subject: kmap_local: don't assume kmap PTEs are linear arrays in memory
has been removed from the -mm tree. Its filename was
kmap_local-dont-assume-kmap-ptes-are-linear-arrays-in-memory.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Ard Biesheuvel <ardb(a)kernel.org>
Subject: kmap_local: don't assume kmap PTEs are linear arrays in memory
The kmap_local conversion broke the ARM architecture, because the new code
assumes that all PTEs used for creating kmaps form a linear array in
memory, and uses array indexing to look up the kmap PTE belonging to a
certain kmap index.
On ARM, this cannot work, not only because the PTE pages may be
non-adjacent in memory, but also because ARM/!LPAE interleaves hardware
entries and extended entries (carrying software-only bits) in a way that
is not compatible with array indexing.
Fortunately, this only seems to affect configurations with more than 8
CPUs, due to the way the per-CPU kmap slots are organized in memory.
Work around this by permitting an architecture to set a Kconfig symbol
that signifies that the kmap PTEs do not form a lineary array in memory,
and so the only way to locate the appropriate one is to walk the page
tables.
Link: https://lore.kernel.org/linux-arm-kernel/20211026131249.3731275-1-ardb@kern…
Link: https://lkml.kernel.org/r/20211116094737.7391-1-ardb@kernel.org
Fixes: 2a15ba82fa6c ("ARM: highmem: Switch to generic kmap atomic")
Signed-off-by: Ard Biesheuvel <ardb(a)kernel.org>
Reported-by: Quanyang Wang <quanyang.wang(a)windriver.com>
Reviewed-by: Linus Walleij <linus.walleij(a)linaro.org>
Acked-by: Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
arch/arm/Kconfig | 1 +
mm/Kconfig | 3 +++
mm/highmem.c | 32 +++++++++++++++++++++-----------
3 files changed, 25 insertions(+), 11 deletions(-)
--- a/arch/arm/Kconfig~kmap_local-dont-assume-kmap-ptes-are-linear-arrays-in-memory
+++ a/arch/arm/Kconfig
@@ -1463,6 +1463,7 @@ config HIGHMEM
bool "High Memory Support"
depends on MMU
select KMAP_LOCAL
+ select KMAP_LOCAL_NON_LINEAR_PTE_ARRAY
help
The address space of ARM processors is only 4 Gigabytes large
and it has to accommodate user address space, kernel address
--- a/mm/highmem.c~kmap_local-dont-assume-kmap-ptes-are-linear-arrays-in-memory
+++ a/mm/highmem.c
@@ -503,16 +503,22 @@ static inline int kmap_local_calc_idx(in
static pte_t *__kmap_pte;
-static pte_t *kmap_get_pte(void)
+static pte_t *kmap_get_pte(unsigned long vaddr, int idx)
{
+ if (IS_ENABLED(CONFIG_KMAP_LOCAL_NON_LINEAR_PTE_ARRAY))
+ /*
+ * Set by the arch if __kmap_pte[-idx] does not produce
+ * the correct entry.
+ */
+ return virt_to_kpte(vaddr);
if (!__kmap_pte)
__kmap_pte = virt_to_kpte(__fix_to_virt(FIX_KMAP_BEGIN));
- return __kmap_pte;
+ return &__kmap_pte[-idx];
}
void *__kmap_local_pfn_prot(unsigned long pfn, pgprot_t prot)
{
- pte_t pteval, *kmap_pte = kmap_get_pte();
+ pte_t pteval, *kmap_pte;
unsigned long vaddr;
int idx;
@@ -524,9 +530,10 @@ void *__kmap_local_pfn_prot(unsigned lon
preempt_disable();
idx = arch_kmap_local_map_idx(kmap_local_idx_push(), pfn);
vaddr = __fix_to_virt(FIX_KMAP_BEGIN + idx);
- BUG_ON(!pte_none(*(kmap_pte - idx)));
+ kmap_pte = kmap_get_pte(vaddr, idx);
+ BUG_ON(!pte_none(*kmap_pte));
pteval = pfn_pte(pfn, prot);
- arch_kmap_local_set_pte(&init_mm, vaddr, kmap_pte - idx, pteval);
+ arch_kmap_local_set_pte(&init_mm, vaddr, kmap_pte, pteval);
arch_kmap_local_post_map(vaddr, pteval);
current->kmap_ctrl.pteval[kmap_local_idx()] = pteval;
preempt_enable();
@@ -559,7 +566,7 @@ EXPORT_SYMBOL(__kmap_local_page_prot);
void kunmap_local_indexed(void *vaddr)
{
unsigned long addr = (unsigned long) vaddr & PAGE_MASK;
- pte_t *kmap_pte = kmap_get_pte();
+ pte_t *kmap_pte;
int idx;
if (addr < __fix_to_virt(FIX_KMAP_END) ||
@@ -584,8 +591,9 @@ void kunmap_local_indexed(void *vaddr)
idx = arch_kmap_local_unmap_idx(kmap_local_idx(), addr);
WARN_ON_ONCE(addr != __fix_to_virt(FIX_KMAP_BEGIN + idx));
+ kmap_pte = kmap_get_pte(addr, idx);
arch_kmap_local_pre_unmap(addr);
- pte_clear(&init_mm, addr, kmap_pte - idx);
+ pte_clear(&init_mm, addr, kmap_pte);
arch_kmap_local_post_unmap(addr);
current->kmap_ctrl.pteval[kmap_local_idx()] = __pte(0);
kmap_local_idx_pop();
@@ -607,7 +615,7 @@ EXPORT_SYMBOL(kunmap_local_indexed);
void __kmap_local_sched_out(void)
{
struct task_struct *tsk = current;
- pte_t *kmap_pte = kmap_get_pte();
+ pte_t *kmap_pte;
int i;
/* Clear kmaps */
@@ -634,8 +642,9 @@ void __kmap_local_sched_out(void)
idx = arch_kmap_local_map_idx(i, pte_pfn(pteval));
addr = __fix_to_virt(FIX_KMAP_BEGIN + idx);
+ kmap_pte = kmap_get_pte(addr, idx);
arch_kmap_local_pre_unmap(addr);
- pte_clear(&init_mm, addr, kmap_pte - idx);
+ pte_clear(&init_mm, addr, kmap_pte);
arch_kmap_local_post_unmap(addr);
}
}
@@ -643,7 +652,7 @@ void __kmap_local_sched_out(void)
void __kmap_local_sched_in(void)
{
struct task_struct *tsk = current;
- pte_t *kmap_pte = kmap_get_pte();
+ pte_t *kmap_pte;
int i;
/* Restore kmaps */
@@ -663,7 +672,8 @@ void __kmap_local_sched_in(void)
/* See comment in __kmap_local_sched_out() */
idx = arch_kmap_local_map_idx(i, pte_pfn(pteval));
addr = __fix_to_virt(FIX_KMAP_BEGIN + idx);
- set_pte_at(&init_mm, addr, kmap_pte - idx, pteval);
+ kmap_pte = kmap_get_pte(addr, idx);
+ set_pte_at(&init_mm, addr, kmap_pte, pteval);
arch_kmap_local_post_map(addr, pteval);
}
}
--- a/mm/Kconfig~kmap_local-dont-assume-kmap-ptes-are-linear-arrays-in-memory
+++ a/mm/Kconfig
@@ -890,6 +890,9 @@ config MAPPING_DIRTY_HELPERS
config KMAP_LOCAL
bool
+config KMAP_LOCAL_NON_LINEAR_PTE_ARRAY
+ bool
+
# struct io_mapping based helper. Selected by drivers that need them
config IO_MAPPING
bool
_
Patches currently in -mm which might be from ardb(a)kernel.org are
This is a partial revert of commit
29bc22ac5e5b ("binder: use euid from cred instead of using task").
Setting sender_euid using proc->cred caused some Android system test
regressions that need further investigation. It is a partial
reversion because subsequent patches rely on proc->cred.
Cc: stable(a)vger.kernel.org # 4.4+
Fixes: 29bc22ac5e5b ("binder: use euid from cred instead of using task")
Signed-off-by: Todd Kjos <tkjos(a)google.com>
Change-Id: I9b1769a3510fed250bb21859ef8beebabe034c66
---
- the issue was introduced in 5.16-rc1, so please apply to 5.16
- this should apply cleanly to all stable branches back to 4.4
that contain "binder: use euid from cred instead of using task"
drivers/android/binder.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/android/binder.c b/drivers/android/binder.c
index 49fb74196d02..cffbe57a8e08 100644
--- a/drivers/android/binder.c
+++ b/drivers/android/binder.c
@@ -2710,7 +2710,7 @@ static void binder_transaction(struct binder_proc *proc,
t->from = thread;
else
t->from = NULL;
- t->sender_euid = proc->cred->euid;
+ t->sender_euid = task_euid(proc->tsk);
t->to_proc = target_proc;
t->to_thread = target_thread;
t->code = tr->code;
--
2.34.0.rc1.387.gb447b232ab-goog
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 96869f193cfd26c2b47db32e4d8bcad50461df7a Mon Sep 17 00:00:00 2001
From: Leon Romanovsky <leon(a)kernel.org>
Date: Tue, 12 Oct 2021 16:15:25 +0300
Subject: [PATCH] net/mlx5: Set devlink reload feature bit for supported
devices only
Mulitport slave device doesn't support devlink reload, so instead of
complicating initialization flow with devlink_reload_enable() which
will be removed in next patch, don't set DEVLINK_F_RELOAD feature bit
for such devices.
This fixes an error when reload counters exposed (and equal zero) for
the mode that is not supported at all.
Fixes: d89ddaae1766 ("net/mlx5: Disable devlink reload for multi port slave device")
Signed-off-by: Leon Romanovsky <leonro(a)nvidia.com>
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
index fa98b7b95990..a85341a41cd0 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
@@ -796,6 +796,7 @@ static void mlx5_devlink_traps_unregister(struct devlink *devlink)
int mlx5_devlink_register(struct devlink *devlink)
{
+ struct mlx5_core_dev *dev = devlink_priv(devlink);
int err;
err = devlink_params_register(devlink, mlx5_devlink_params,
@@ -813,7 +814,9 @@ int mlx5_devlink_register(struct devlink *devlink)
if (err)
goto traps_reg_err;
- devlink_set_features(devlink, DEVLINK_F_RELOAD);
+ if (!mlx5_core_is_mp_slave(dev))
+ devlink_set_features(devlink, DEVLINK_F_RELOAD);
+
return 0;
traps_reg_err:
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 96869f193cfd26c2b47db32e4d8bcad50461df7a Mon Sep 17 00:00:00 2001
From: Leon Romanovsky <leon(a)kernel.org>
Date: Tue, 12 Oct 2021 16:15:25 +0300
Subject: [PATCH] net/mlx5: Set devlink reload feature bit for supported
devices only
Mulitport slave device doesn't support devlink reload, so instead of
complicating initialization flow with devlink_reload_enable() which
will be removed in next patch, don't set DEVLINK_F_RELOAD feature bit
for such devices.
This fixes an error when reload counters exposed (and equal zero) for
the mode that is not supported at all.
Fixes: d89ddaae1766 ("net/mlx5: Disable devlink reload for multi port slave device")
Signed-off-by: Leon Romanovsky <leonro(a)nvidia.com>
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
index fa98b7b95990..a85341a41cd0 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
@@ -796,6 +796,7 @@ static void mlx5_devlink_traps_unregister(struct devlink *devlink)
int mlx5_devlink_register(struct devlink *devlink)
{
+ struct mlx5_core_dev *dev = devlink_priv(devlink);
int err;
err = devlink_params_register(devlink, mlx5_devlink_params,
@@ -813,7 +814,9 @@ int mlx5_devlink_register(struct devlink *devlink)
if (err)
goto traps_reg_err;
- devlink_set_features(devlink, DEVLINK_F_RELOAD);
+ if (!mlx5_core_is_mp_slave(dev))
+ devlink_set_features(devlink, DEVLINK_F_RELOAD);
+
return 0;
traps_reg_err:
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From e6ba5273d4ede03d075d7a116b8edad1f6115f4d Mon Sep 17 00:00:00 2001
From: Brett Creeley <brett.creeley(a)intel.com>
Date: Thu, 9 Sep 2021 14:38:09 -0700
Subject: [PATCH] ice: Fix race conditions between virtchnl handling and VF ndo
ops
The VF can be configured via the PF's ndo ops at the same time the PF is
receiving/handling virtchnl messages. This has many issues, with
one of them being the ndo op could be actively resetting a VF (i.e.
resetting it to the default state and deleting/re-adding the VF's VSI)
while a virtchnl message is being handled. The following error was seen
because a VF ndo op was used to change a VF's trust setting while the
VIRTCHNL_OP_CONFIG_VSI_QUEUES was ongoing:
[35274.192484] ice 0000:88:00.0: Failed to set LAN Tx queue context, error: ICE_ERR_PARAM
[35274.193074] ice 0000:88:00.0: VF 0 failed opcode 6, retval: -5
[35274.193640] iavf 0000:88:01.0: PF returned error -5 (IAVF_ERR_PARAM) to our request 6
Fix this by making sure the virtchnl handling and VF ndo ops that
trigger VF resets cannot run concurrently. This is done by adding a
struct mutex cfg_lock to each VF structure. For VF ndo ops, the mutex
will be locked around the critical operations and VFR. Since the ndo ops
will trigger a VFR, the virtchnl thread will use mutex_trylock(). This
is done because if any other thread (i.e. VF ndo op) has the mutex, then
that means the current VF message being handled is no longer valid, so
just ignore it.
This issue can be seen using the following commands:
for i in {0..50}; do
rmmod ice
modprobe ice
sleep 1
echo 1 > /sys/class/net/ens785f0/device/sriov_numvfs
echo 1 > /sys/class/net/ens785f1/device/sriov_numvfs
ip link set ens785f1 vf 0 trust on
ip link set ens785f0 vf 0 trust on
sleep 2
echo 0 > /sys/class/net/ens785f0/device/sriov_numvfs
echo 0 > /sys/class/net/ens785f1/device/sriov_numvfs
sleep 1
echo 1 > /sys/class/net/ens785f0/device/sriov_numvfs
echo 1 > /sys/class/net/ens785f1/device/sriov_numvfs
ip link set ens785f1 vf 0 trust on
ip link set ens785f0 vf 0 trust on
done
Fixes: 7c710869d64e ("ice: Add handlers for VF netdevice operations")
Signed-off-by: Brett Creeley <brett.creeley(a)intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski(a)intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen(a)intel.com>
diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c
index 3f727df3b6fb..217ff5e9a6f1 100644
--- a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c
+++ b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c
@@ -650,6 +650,8 @@ void ice_free_vfs(struct ice_pf *pf)
set_bit(ICE_VF_STATE_DIS, pf->vf[i].vf_states);
ice_free_vf_res(&pf->vf[i]);
}
+
+ mutex_destroy(&pf->vf[i].cfg_lock);
}
if (ice_sriov_free_msix_res(pf))
@@ -1946,6 +1948,8 @@ static void ice_set_dflt_settings_vfs(struct ice_pf *pf)
ice_vf_fdir_init(vf);
ice_vc_set_dflt_vf_ops(&vf->vc_ops);
+
+ mutex_init(&vf->cfg_lock);
}
}
@@ -4135,6 +4139,8 @@ ice_set_vf_port_vlan(struct net_device *netdev, int vf_id, u16 vlan_id, u8 qos,
return 0;
}
+ mutex_lock(&vf->cfg_lock);
+
vf->port_vlan_info = vlanprio;
if (vf->port_vlan_info)
@@ -4144,6 +4150,7 @@ ice_set_vf_port_vlan(struct net_device *netdev, int vf_id, u16 vlan_id, u8 qos,
dev_info(dev, "Clearing port VLAN on VF %d\n", vf_id);
ice_vc_reset_vf(vf);
+ mutex_unlock(&vf->cfg_lock);
return 0;
}
@@ -4683,6 +4690,15 @@ void ice_vc_process_vf_msg(struct ice_pf *pf, struct ice_rq_event_info *event)
return;
}
+ /* VF is being configured in another context that triggers a VFR, so no
+ * need to process this message
+ */
+ if (!mutex_trylock(&vf->cfg_lock)) {
+ dev_info(dev, "VF %u is being configured in another context that will trigger a VFR, so there is no need to handle this message\n",
+ vf->vf_id);
+ return;
+ }
+
switch (v_opcode) {
case VIRTCHNL_OP_VERSION:
err = ops->get_ver_msg(vf, msg);
@@ -4771,6 +4787,8 @@ void ice_vc_process_vf_msg(struct ice_pf *pf, struct ice_rq_event_info *event)
dev_info(dev, "PF failed to honor VF %d, opcode %d, error %d\n",
vf_id, v_opcode, err);
}
+
+ mutex_unlock(&vf->cfg_lock);
}
/**
@@ -4886,6 +4904,8 @@ int ice_set_vf_mac(struct net_device *netdev, int vf_id, u8 *mac)
return -EINVAL;
}
+ mutex_lock(&vf->cfg_lock);
+
/* VF is notified of its new MAC via the PF's response to the
* VIRTCHNL_OP_GET_VF_RESOURCES message after the VF has been reset
*/
@@ -4904,6 +4924,7 @@ int ice_set_vf_mac(struct net_device *netdev, int vf_id, u8 *mac)
}
ice_vc_reset_vf(vf);
+ mutex_unlock(&vf->cfg_lock);
return 0;
}
@@ -4938,11 +4959,15 @@ int ice_set_vf_trust(struct net_device *netdev, int vf_id, bool trusted)
if (trusted == vf->trusted)
return 0;
+ mutex_lock(&vf->cfg_lock);
+
vf->trusted = trusted;
ice_vc_reset_vf(vf);
dev_info(ice_pf_to_dev(pf), "VF %u is now %strusted\n",
vf_id, trusted ? "" : "un");
+ mutex_unlock(&vf->cfg_lock);
+
return 0;
}
diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.h b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.h
index 5ff93a08f54c..7e28ecbbe7af 100644
--- a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.h
+++ b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.h
@@ -100,6 +100,11 @@ struct ice_vc_vf_ops {
struct ice_vf {
struct ice_pf *pf;
+ /* Used during virtchnl message handling and NDO ops against the VF
+ * that will trigger a VFR
+ */
+ struct mutex cfg_lock;
+
u16 vf_id; /* VF ID in the PF space */
u16 lan_vsi_idx; /* index into PF struct */
u16 ctrl_vsi_idx;
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From e6ba5273d4ede03d075d7a116b8edad1f6115f4d Mon Sep 17 00:00:00 2001
From: Brett Creeley <brett.creeley(a)intel.com>
Date: Thu, 9 Sep 2021 14:38:09 -0700
Subject: [PATCH] ice: Fix race conditions between virtchnl handling and VF ndo
ops
The VF can be configured via the PF's ndo ops at the same time the PF is
receiving/handling virtchnl messages. This has many issues, with
one of them being the ndo op could be actively resetting a VF (i.e.
resetting it to the default state and deleting/re-adding the VF's VSI)
while a virtchnl message is being handled. The following error was seen
because a VF ndo op was used to change a VF's trust setting while the
VIRTCHNL_OP_CONFIG_VSI_QUEUES was ongoing:
[35274.192484] ice 0000:88:00.0: Failed to set LAN Tx queue context, error: ICE_ERR_PARAM
[35274.193074] ice 0000:88:00.0: VF 0 failed opcode 6, retval: -5
[35274.193640] iavf 0000:88:01.0: PF returned error -5 (IAVF_ERR_PARAM) to our request 6
Fix this by making sure the virtchnl handling and VF ndo ops that
trigger VF resets cannot run concurrently. This is done by adding a
struct mutex cfg_lock to each VF structure. For VF ndo ops, the mutex
will be locked around the critical operations and VFR. Since the ndo ops
will trigger a VFR, the virtchnl thread will use mutex_trylock(). This
is done because if any other thread (i.e. VF ndo op) has the mutex, then
that means the current VF message being handled is no longer valid, so
just ignore it.
This issue can be seen using the following commands:
for i in {0..50}; do
rmmod ice
modprobe ice
sleep 1
echo 1 > /sys/class/net/ens785f0/device/sriov_numvfs
echo 1 > /sys/class/net/ens785f1/device/sriov_numvfs
ip link set ens785f1 vf 0 trust on
ip link set ens785f0 vf 0 trust on
sleep 2
echo 0 > /sys/class/net/ens785f0/device/sriov_numvfs
echo 0 > /sys/class/net/ens785f1/device/sriov_numvfs
sleep 1
echo 1 > /sys/class/net/ens785f0/device/sriov_numvfs
echo 1 > /sys/class/net/ens785f1/device/sriov_numvfs
ip link set ens785f1 vf 0 trust on
ip link set ens785f0 vf 0 trust on
done
Fixes: 7c710869d64e ("ice: Add handlers for VF netdevice operations")
Signed-off-by: Brett Creeley <brett.creeley(a)intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski(a)intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen(a)intel.com>
diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c
index 3f727df3b6fb..217ff5e9a6f1 100644
--- a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c
+++ b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c
@@ -650,6 +650,8 @@ void ice_free_vfs(struct ice_pf *pf)
set_bit(ICE_VF_STATE_DIS, pf->vf[i].vf_states);
ice_free_vf_res(&pf->vf[i]);
}
+
+ mutex_destroy(&pf->vf[i].cfg_lock);
}
if (ice_sriov_free_msix_res(pf))
@@ -1946,6 +1948,8 @@ static void ice_set_dflt_settings_vfs(struct ice_pf *pf)
ice_vf_fdir_init(vf);
ice_vc_set_dflt_vf_ops(&vf->vc_ops);
+
+ mutex_init(&vf->cfg_lock);
}
}
@@ -4135,6 +4139,8 @@ ice_set_vf_port_vlan(struct net_device *netdev, int vf_id, u16 vlan_id, u8 qos,
return 0;
}
+ mutex_lock(&vf->cfg_lock);
+
vf->port_vlan_info = vlanprio;
if (vf->port_vlan_info)
@@ -4144,6 +4150,7 @@ ice_set_vf_port_vlan(struct net_device *netdev, int vf_id, u16 vlan_id, u8 qos,
dev_info(dev, "Clearing port VLAN on VF %d\n", vf_id);
ice_vc_reset_vf(vf);
+ mutex_unlock(&vf->cfg_lock);
return 0;
}
@@ -4683,6 +4690,15 @@ void ice_vc_process_vf_msg(struct ice_pf *pf, struct ice_rq_event_info *event)
return;
}
+ /* VF is being configured in another context that triggers a VFR, so no
+ * need to process this message
+ */
+ if (!mutex_trylock(&vf->cfg_lock)) {
+ dev_info(dev, "VF %u is being configured in another context that will trigger a VFR, so there is no need to handle this message\n",
+ vf->vf_id);
+ return;
+ }
+
switch (v_opcode) {
case VIRTCHNL_OP_VERSION:
err = ops->get_ver_msg(vf, msg);
@@ -4771,6 +4787,8 @@ void ice_vc_process_vf_msg(struct ice_pf *pf, struct ice_rq_event_info *event)
dev_info(dev, "PF failed to honor VF %d, opcode %d, error %d\n",
vf_id, v_opcode, err);
}
+
+ mutex_unlock(&vf->cfg_lock);
}
/**
@@ -4886,6 +4904,8 @@ int ice_set_vf_mac(struct net_device *netdev, int vf_id, u8 *mac)
return -EINVAL;
}
+ mutex_lock(&vf->cfg_lock);
+
/* VF is notified of its new MAC via the PF's response to the
* VIRTCHNL_OP_GET_VF_RESOURCES message after the VF has been reset
*/
@@ -4904,6 +4924,7 @@ int ice_set_vf_mac(struct net_device *netdev, int vf_id, u8 *mac)
}
ice_vc_reset_vf(vf);
+ mutex_unlock(&vf->cfg_lock);
return 0;
}
@@ -4938,11 +4959,15 @@ int ice_set_vf_trust(struct net_device *netdev, int vf_id, bool trusted)
if (trusted == vf->trusted)
return 0;
+ mutex_lock(&vf->cfg_lock);
+
vf->trusted = trusted;
ice_vc_reset_vf(vf);
dev_info(ice_pf_to_dev(pf), "VF %u is now %strusted\n",
vf_id, trusted ? "" : "un");
+ mutex_unlock(&vf->cfg_lock);
+
return 0;
}
diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.h b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.h
index 5ff93a08f54c..7e28ecbbe7af 100644
--- a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.h
+++ b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.h
@@ -100,6 +100,11 @@ struct ice_vc_vf_ops {
struct ice_vf {
struct ice_pf *pf;
+ /* Used during virtchnl message handling and NDO ops against the VF
+ * that will trigger a VFR
+ */
+ struct mutex cfg_lock;
+
u16 vf_id; /* VF ID in the PF space */
u16 lan_vsi_idx; /* index into PF struct */
u16 ctrl_vsi_idx;
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From e6ba5273d4ede03d075d7a116b8edad1f6115f4d Mon Sep 17 00:00:00 2001
From: Brett Creeley <brett.creeley(a)intel.com>
Date: Thu, 9 Sep 2021 14:38:09 -0700
Subject: [PATCH] ice: Fix race conditions between virtchnl handling and VF ndo
ops
The VF can be configured via the PF's ndo ops at the same time the PF is
receiving/handling virtchnl messages. This has many issues, with
one of them being the ndo op could be actively resetting a VF (i.e.
resetting it to the default state and deleting/re-adding the VF's VSI)
while a virtchnl message is being handled. The following error was seen
because a VF ndo op was used to change a VF's trust setting while the
VIRTCHNL_OP_CONFIG_VSI_QUEUES was ongoing:
[35274.192484] ice 0000:88:00.0: Failed to set LAN Tx queue context, error: ICE_ERR_PARAM
[35274.193074] ice 0000:88:00.0: VF 0 failed opcode 6, retval: -5
[35274.193640] iavf 0000:88:01.0: PF returned error -5 (IAVF_ERR_PARAM) to our request 6
Fix this by making sure the virtchnl handling and VF ndo ops that
trigger VF resets cannot run concurrently. This is done by adding a
struct mutex cfg_lock to each VF structure. For VF ndo ops, the mutex
will be locked around the critical operations and VFR. Since the ndo ops
will trigger a VFR, the virtchnl thread will use mutex_trylock(). This
is done because if any other thread (i.e. VF ndo op) has the mutex, then
that means the current VF message being handled is no longer valid, so
just ignore it.
This issue can be seen using the following commands:
for i in {0..50}; do
rmmod ice
modprobe ice
sleep 1
echo 1 > /sys/class/net/ens785f0/device/sriov_numvfs
echo 1 > /sys/class/net/ens785f1/device/sriov_numvfs
ip link set ens785f1 vf 0 trust on
ip link set ens785f0 vf 0 trust on
sleep 2
echo 0 > /sys/class/net/ens785f0/device/sriov_numvfs
echo 0 > /sys/class/net/ens785f1/device/sriov_numvfs
sleep 1
echo 1 > /sys/class/net/ens785f0/device/sriov_numvfs
echo 1 > /sys/class/net/ens785f1/device/sriov_numvfs
ip link set ens785f1 vf 0 trust on
ip link set ens785f0 vf 0 trust on
done
Fixes: 7c710869d64e ("ice: Add handlers for VF netdevice operations")
Signed-off-by: Brett Creeley <brett.creeley(a)intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski(a)intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen(a)intel.com>
diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c
index 3f727df3b6fb..217ff5e9a6f1 100644
--- a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c
+++ b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c
@@ -650,6 +650,8 @@ void ice_free_vfs(struct ice_pf *pf)
set_bit(ICE_VF_STATE_DIS, pf->vf[i].vf_states);
ice_free_vf_res(&pf->vf[i]);
}
+
+ mutex_destroy(&pf->vf[i].cfg_lock);
}
if (ice_sriov_free_msix_res(pf))
@@ -1946,6 +1948,8 @@ static void ice_set_dflt_settings_vfs(struct ice_pf *pf)
ice_vf_fdir_init(vf);
ice_vc_set_dflt_vf_ops(&vf->vc_ops);
+
+ mutex_init(&vf->cfg_lock);
}
}
@@ -4135,6 +4139,8 @@ ice_set_vf_port_vlan(struct net_device *netdev, int vf_id, u16 vlan_id, u8 qos,
return 0;
}
+ mutex_lock(&vf->cfg_lock);
+
vf->port_vlan_info = vlanprio;
if (vf->port_vlan_info)
@@ -4144,6 +4150,7 @@ ice_set_vf_port_vlan(struct net_device *netdev, int vf_id, u16 vlan_id, u8 qos,
dev_info(dev, "Clearing port VLAN on VF %d\n", vf_id);
ice_vc_reset_vf(vf);
+ mutex_unlock(&vf->cfg_lock);
return 0;
}
@@ -4683,6 +4690,15 @@ void ice_vc_process_vf_msg(struct ice_pf *pf, struct ice_rq_event_info *event)
return;
}
+ /* VF is being configured in another context that triggers a VFR, so no
+ * need to process this message
+ */
+ if (!mutex_trylock(&vf->cfg_lock)) {
+ dev_info(dev, "VF %u is being configured in another context that will trigger a VFR, so there is no need to handle this message\n",
+ vf->vf_id);
+ return;
+ }
+
switch (v_opcode) {
case VIRTCHNL_OP_VERSION:
err = ops->get_ver_msg(vf, msg);
@@ -4771,6 +4787,8 @@ void ice_vc_process_vf_msg(struct ice_pf *pf, struct ice_rq_event_info *event)
dev_info(dev, "PF failed to honor VF %d, opcode %d, error %d\n",
vf_id, v_opcode, err);
}
+
+ mutex_unlock(&vf->cfg_lock);
}
/**
@@ -4886,6 +4904,8 @@ int ice_set_vf_mac(struct net_device *netdev, int vf_id, u8 *mac)
return -EINVAL;
}
+ mutex_lock(&vf->cfg_lock);
+
/* VF is notified of its new MAC via the PF's response to the
* VIRTCHNL_OP_GET_VF_RESOURCES message after the VF has been reset
*/
@@ -4904,6 +4924,7 @@ int ice_set_vf_mac(struct net_device *netdev, int vf_id, u8 *mac)
}
ice_vc_reset_vf(vf);
+ mutex_unlock(&vf->cfg_lock);
return 0;
}
@@ -4938,11 +4959,15 @@ int ice_set_vf_trust(struct net_device *netdev, int vf_id, bool trusted)
if (trusted == vf->trusted)
return 0;
+ mutex_lock(&vf->cfg_lock);
+
vf->trusted = trusted;
ice_vc_reset_vf(vf);
dev_info(ice_pf_to_dev(pf), "VF %u is now %strusted\n",
vf_id, trusted ? "" : "un");
+ mutex_unlock(&vf->cfg_lock);
+
return 0;
}
diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.h b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.h
index 5ff93a08f54c..7e28ecbbe7af 100644
--- a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.h
+++ b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.h
@@ -100,6 +100,11 @@ struct ice_vc_vf_ops {
struct ice_vf {
struct ice_pf *pf;
+ /* Used during virtchnl message handling and NDO ops against the VF
+ * that will trigger a VFR
+ */
+ struct mutex cfg_lock;
+
u16 vf_id; /* VF ID in the PF space */
u16 lan_vsi_idx; /* index into PF struct */
u16 ctrl_vsi_idx;
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 2ff04286a9569675948f39cec2c6ad47c3584633 Mon Sep 17 00:00:00 2001
From: Leon Romanovsky <leon(a)kernel.org>
Date: Thu, 23 Sep 2021 21:12:52 +0300
Subject: [PATCH] ice: Delete always true check of PF pointer
PF pointer is always valid when PCI core calls its .shutdown() and
.remove() callbacks. There is no need to check it again.
Fixes: 837f08fdecbe ("ice: Add basic driver framework for Intel(R) E800 Series")
Signed-off-by: Leon Romanovsky <leonro(a)nvidia.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index 34e64533026a..aacc0b345bbe 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -4593,9 +4593,6 @@ static void ice_remove(struct pci_dev *pdev)
struct ice_pf *pf = pci_get_drvdata(pdev);
int i;
- if (!pf)
- return;
-
for (i = 0; i < ICE_MAX_RESET_WAIT; i++) {
if (!ice_is_reset_in_progress(pf->state))
break;
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 0299faeaf8eb982103e4388af61fd94feb9c2d9f Mon Sep 17 00:00:00 2001
From: Brett Creeley <brett.creeley(a)intel.com>
Date: Wed, 5 May 2021 14:17:57 -0700
Subject: [PATCH] ice: Remove toggling of antispoof for VF trusted promiscuous
mode
Currently when a trusted VF enables promiscuous mode spoofchk will be
disabled. This is wrong and should only be modified from the
ndo_set_vf_spoofchk callback. Fix this by removing the call to toggle
spoofchk for trusted VFs.
Fixes: 01b5e89aab49 ("ice: Add VF promiscuous support")
Signed-off-by: Brett Creeley <brett.creeley(a)intel.com>
Tested-by: Tony Brelinski <tony.brelinski(a)intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen(a)intel.com>
diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c
index 9b699419c933..3f8f94732a1f 100644
--- a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c
+++ b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c
@@ -3055,24 +3055,6 @@ static int ice_vc_cfg_promiscuous_mode_msg(struct ice_vf *vf, u8 *msg)
rm_promisc = !allmulti && !alluni;
if (vsi->num_vlan || vf->port_vlan_info) {
- struct ice_vsi *pf_vsi = ice_get_main_vsi(pf);
- struct net_device *pf_netdev;
-
- if (!pf_vsi) {
- v_ret = VIRTCHNL_STATUS_ERR_PARAM;
- goto error_param;
- }
-
- pf_netdev = pf_vsi->netdev;
-
- ret = ice_set_vf_spoofchk(pf_netdev, vf->vf_id, rm_promisc);
- if (ret) {
- dev_err(dev, "Failed to update spoofchk to %s for VF %d VSI %d when setting promiscuous mode\n",
- rm_promisc ? "ON" : "OFF", vf->vf_id,
- vsi->vsi_num);
- v_ret = VIRTCHNL_STATUS_ERR_PARAM;
- }
-
if (rm_promisc)
ret = ice_cfg_vlan_pruning(vsi, true);
else
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 0299faeaf8eb982103e4388af61fd94feb9c2d9f Mon Sep 17 00:00:00 2001
From: Brett Creeley <brett.creeley(a)intel.com>
Date: Wed, 5 May 2021 14:17:57 -0700
Subject: [PATCH] ice: Remove toggling of antispoof for VF trusted promiscuous
mode
Currently when a trusted VF enables promiscuous mode spoofchk will be
disabled. This is wrong and should only be modified from the
ndo_set_vf_spoofchk callback. Fix this by removing the call to toggle
spoofchk for trusted VFs.
Fixes: 01b5e89aab49 ("ice: Add VF promiscuous support")
Signed-off-by: Brett Creeley <brett.creeley(a)intel.com>
Tested-by: Tony Brelinski <tony.brelinski(a)intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen(a)intel.com>
diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c
index 9b699419c933..3f8f94732a1f 100644
--- a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c
+++ b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c
@@ -3055,24 +3055,6 @@ static int ice_vc_cfg_promiscuous_mode_msg(struct ice_vf *vf, u8 *msg)
rm_promisc = !allmulti && !alluni;
if (vsi->num_vlan || vf->port_vlan_info) {
- struct ice_vsi *pf_vsi = ice_get_main_vsi(pf);
- struct net_device *pf_netdev;
-
- if (!pf_vsi) {
- v_ret = VIRTCHNL_STATUS_ERR_PARAM;
- goto error_param;
- }
-
- pf_netdev = pf_vsi->netdev;
-
- ret = ice_set_vf_spoofchk(pf_netdev, vf->vf_id, rm_promisc);
- if (ret) {
- dev_err(dev, "Failed to update spoofchk to %s for VF %d VSI %d when setting promiscuous mode\n",
- rm_promisc ? "ON" : "OFF", vf->vf_id,
- vsi->vsi_num);
- v_ret = VIRTCHNL_STATUS_ERR_PARAM;
- }
-
if (rm_promisc)
ret = ice_cfg_vlan_pruning(vsi, true);
else
On Tue, Nov 23, 2021 at 10:05:20AM +0000, Fernandes, Francois wrote:
> Hi,
>
> First of all thanks for your very interesting website.
> We contact you today because we are looking for an information regarding the Kernels versions.
> We are using the following version : Kernel V5.4V20
>
> Regarding your table hereunder, we understand that this version will be EOL in December 2025.
>
> [cid:image004.jpg@01D7E05A.07F76050]
>
> Could you please advise :
> - What will happened in January 2026 ?
Two things:
1. most likely: a final 5.4.x version will be released and no new 5.4.x
versions will be provided after that (meaning no new security or bug fixes),
or
2. less likely: someone else will step up to maintain the 5.4 series instead
of the current stable kernel team, in which case the EOL deadline will be
extended further
> - Is the evolution to a newer version imperative ?
Yes. It is never a good idea to run a kernel version that is no longer
receiving security updates -- unless your devices run completely offline with
no external input of any kind.
Note, that you don't have to wait for the 5.4.x to reach EOL before you plan
your switch to a newer LTS tree. You should prepare for it well in advance.
> - Is this evolution a difficult operation ?
There is no simple answer to this question. It greatly depends on how you use
the kernel for your project. If you maintain many custom kernel modules, then
porting them to a newer version of the kernel can require some effort. If you
are using a vanilla kernel version running on common hardware, then switching
to a newer kernel tree could be very easy. In any case, you should plan out
proper development and testing resources.
> Thanks in advance for your help on this subject.
I have cc'd the stable list, where you can get further help for questions you
may have.
-K
From: msizanoen1 <msizanoen(a)qtmlabs.xyz>
The kernel leaks memory when a `fib` rule is present in IPv6 nftables
firewall rules and a suppress_prefix rule is present in the IPv6 routing
rules (used by certain tools such as wg-quick). In such scenarios, every
incoming packet will leak an allocation in `ip6_dst_cache` slab cache.
After some hours of `bpftrace`-ing and source code reading, I tracked
down the issue to ca7a03c41753 ("ipv6: do not free rt if
FIB_LOOKUP_NOREF is set on suppress rule").
The problem with that change is that the generic `args->flags` always have
`FIB_LOOKUP_NOREF` set[1][2] but the IPv6-specific flag
`RT6_LOOKUP_F_DST_NOREF` might not be, leading to `fib6_rule_suppress` not
decreasing the refcount when needed.
How to reproduce:
- Add the following nftables rule to a prerouting chain:
meta nfproto ipv6 fib saddr . mark . iif oif missing drop
This can be done with:
sudo nft create table inet test
sudo nft create chain inet test test_chain '{ type filter hook prerouting priority filter + 10; policy accept; }'
sudo nft add rule inet test test_chain meta nfproto ipv6 fib saddr . mark . iif oif missing drop
- Run:
sudo ip -6 rule add table main suppress_prefixlength 0
- Watch `sudo slabtop -o | grep ip6_dst_cache` to see memory usage increase
with every incoming ipv6 packet.
This patch exposes the protocol-specific flags to the protocol
specific `suppress` function, and check the protocol-specific `flags`
argument for RT6_LOOKUP_F_DST_NOREF instead of the generic
FIB_LOOKUP_NOREF when decreasing the refcount, like this.
[1]: https://github.com/torvalds/linux/blob/ca7a03c4175366a92cee0ccc4fec0038c326…
[2]: https://github.com/torvalds/linux/blob/ca7a03c4175366a92cee0ccc4fec0038c326…
Link: https://bugzilla.kernel.org/show_bug.cgi?id=215105
Fixes: ca7a03c41753 ("ipv6: do not free rt if FIB_LOOKUP_NOREF is set on suppress rule")
Cc: stable(a)vger.kernel.org
Signed-off-by: Jason A. Donenfeld <Jason(a)zx2c4.com>
---
The original author of this commit and commit message is anonymous and
is therefore unable to sign off on it. Greg suggested that I do the sign
off, extracting it from the bugzilla entry above, and post it properly.
The patch "seems to work" on first glance, but I haven't looked deeply
at it yet and therefore it doesn't have my Reviewed-by, even though I'm
submitting this patch on the author's behalf. And it should probably get
a good look from the v6 fib folks. The original author should be on this
thread to address issues that come off, and I'll shephard additional
versions that he has.
include/net/fib_rules.h | 4 +++-
net/core/fib_rules.c | 2 +-
net/ipv4/fib_rules.c | 1 +
net/ipv6/fib6_rules.c | 4 ++--
4 files changed, 7 insertions(+), 4 deletions(-)
diff --git a/include/net/fib_rules.h b/include/net/fib_rules.h
index 4b10676c69d1..bd07484ab9dd 100644
--- a/include/net/fib_rules.h
+++ b/include/net/fib_rules.h
@@ -69,7 +69,7 @@ struct fib_rules_ops {
int (*action)(struct fib_rule *,
struct flowi *, int,
struct fib_lookup_arg *);
- bool (*suppress)(struct fib_rule *,
+ bool (*suppress)(struct fib_rule *, int,
struct fib_lookup_arg *);
int (*match)(struct fib_rule *,
struct flowi *, int);
@@ -218,7 +218,9 @@ INDIRECT_CALLABLE_DECLARE(int fib4_rule_action(struct fib_rule *rule,
struct fib_lookup_arg *arg));
INDIRECT_CALLABLE_DECLARE(bool fib6_rule_suppress(struct fib_rule *rule,
+ int flags,
struct fib_lookup_arg *arg));
INDIRECT_CALLABLE_DECLARE(bool fib4_rule_suppress(struct fib_rule *rule,
+ int flags,
struct fib_lookup_arg *arg));
#endif
diff --git a/net/core/fib_rules.c b/net/core/fib_rules.c
index 79df7cd9dbc1..1bb567a3b329 100644
--- a/net/core/fib_rules.c
+++ b/net/core/fib_rules.c
@@ -323,7 +323,7 @@ int fib_rules_lookup(struct fib_rules_ops *ops, struct flowi *fl,
if (!err && ops->suppress && INDIRECT_CALL_MT(ops->suppress,
fib6_rule_suppress,
fib4_rule_suppress,
- rule, arg))
+ rule, flags, arg))
continue;
if (err != -EAGAIN) {
diff --git a/net/ipv4/fib_rules.c b/net/ipv4/fib_rules.c
index ce54a30c2ef1..364ad3446b2f 100644
--- a/net/ipv4/fib_rules.c
+++ b/net/ipv4/fib_rules.c
@@ -141,6 +141,7 @@ INDIRECT_CALLABLE_SCOPE int fib4_rule_action(struct fib_rule *rule,
}
INDIRECT_CALLABLE_SCOPE bool fib4_rule_suppress(struct fib_rule *rule,
+ int flags,
struct fib_lookup_arg *arg)
{
struct fib_result *result = (struct fib_result *) arg->result;
diff --git a/net/ipv6/fib6_rules.c b/net/ipv6/fib6_rules.c
index 40f3e4f9f33a..dcedfe29d9d9 100644
--- a/net/ipv6/fib6_rules.c
+++ b/net/ipv6/fib6_rules.c
@@ -267,6 +267,7 @@ INDIRECT_CALLABLE_SCOPE int fib6_rule_action(struct fib_rule *rule,
}
INDIRECT_CALLABLE_SCOPE bool fib6_rule_suppress(struct fib_rule *rule,
+ int flags,
struct fib_lookup_arg *arg)
{
struct fib6_result *res = arg->result;
@@ -294,8 +295,7 @@ INDIRECT_CALLABLE_SCOPE bool fib6_rule_suppress(struct fib_rule *rule,
return false;
suppress_route:
- if (!(arg->flags & FIB_LOOKUP_NOREF))
- ip6_rt_put(rt);
+ ip6_rt_put_flags(rt, flags);
return true;
}
--
2.34.0
From: Longpeng <longpeng2(a)huawei.com>
The system will crash if we put an uninitialized iova_domain, this
could happen when an error occurs before initializing the iova_domain
in vdpasim_create().
BUG: kernel NULL pointer dereference, address: 0000000000000000
...
RIP: 0010:__cpuhp_state_remove_instance+0x96/0x1c0
...
Call Trace:
<TASK>
put_iova_domain+0x29/0x220
vdpasim_free+0xd1/0x120 [vdpa_sim]
vdpa_release_dev+0x21/0x40 [vdpa]
device_release+0x33/0x90
kobject_release+0x63/0x160
vdpasim_create+0x127/0x2a0 [vdpa_sim]
vdpasim_net_dev_add+0x7d/0xfe [vdpa_sim_net]
vdpa_nl_cmd_dev_add_set_doit+0xe1/0x1a0 [vdpa]
genl_family_rcv_msg_doit+0x112/0x140
genl_rcv_msg+0xdf/0x1d0
...
So we must make sure the iova_domain is already initialized before
put it.
In addition, we may get the following warning in this case:
WARNING: ... drivers/iommu/iova.c:344 iova_cache_put+0x58/0x70
So we must make sure the iova_cache_put() is invoked only if the
iova_cache_get() is already invoked. Let's fix it together.
Cc: stable(a)vger.kernel.org
Fixes: 4080fc106750 ("vdpa_sim: use iova module to allocate IOVA addresses")
Signed-off-by: Longpeng <longpeng2(a)huawei.com>
Acked-by: Jason Wang <jasowang(a)redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare(a)redhat.com>
---
Changes v2 -> v1:
- add "Fixes" tag [Parav, Jason]
---
drivers/vdpa/vdpa_sim/vdpa_sim.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c
index 5f484ff..41b0cd1 100644
--- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
+++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
@@ -591,8 +591,11 @@ static void vdpasim_free(struct vdpa_device *vdpa)
vringh_kiov_cleanup(&vdpasim->vqs[i].in_iov);
}
- put_iova_domain(&vdpasim->iova);
- iova_cache_put();
+ if (vdpa_get_dma_dev(vdpa)) {
+ put_iova_domain(&vdpasim->iova);
+ iova_cache_put();
+ }
+
kvfree(vdpasim->buffer);
if (vdpasim->iommu)
vhost_iotlb_free(vdpasim->iommu);
--
1.8.3.1