This is a note to let you know that I've just added the patch titled
misc: rtsx: Avoid mangling IRQ during runtime PM
to my char-misc git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
in the char-misc-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From 0edeb8992db8e7de9b8fe3164ace9a4356b17021 Mon Sep 17 00:00:00 2001
From: Kai-Heng Feng <kai.heng.feng(a)canonical.com>
Date: Fri, 26 Nov 2021 08:32:44 +0800
Subject: misc: rtsx: Avoid mangling IRQ during runtime PM
After commit 5b4258f6721f ("misc: rtsx: rts5249 support runtime PM"), when the
rtsx controller is runtime suspended, bring CPUs offline and back online, the
runtime resume of the controller will fail:
[ 47.319391] smpboot: CPU 1 is now offline
[ 47.414140] x86: Booting SMP configuration:
[ 47.414147] smpboot: Booting Node 0 Processor 1 APIC 0x2
[ 47.571334] smpboot: CPU 2 is now offline
[ 47.686055] smpboot: Booting Node 0 Processor 2 APIC 0x4
[ 47.808174] smpboot: CPU 3 is now offline
[ 47.878146] smpboot: Booting Node 0 Processor 3 APIC 0x6
[ 48.003679] smpboot: CPU 4 is now offline
[ 48.086187] smpboot: Booting Node 0 Processor 4 APIC 0x1
[ 48.239627] smpboot: CPU 5 is now offline
[ 48.326059] smpboot: Booting Node 0 Processor 5 APIC 0x3
[ 48.472193] smpboot: CPU 6 is now offline
[ 48.574181] smpboot: Booting Node 0 Processor 6 APIC 0x5
[ 48.743375] smpboot: CPU 7 is now offline
[ 48.838047] smpboot: Booting Node 0 Processor 7 APIC 0x7
[ 48.965447] __common_interrupt: 1.35 No irq handler for vector
[ 51.174065] mmc0: error -110 doing runtime resume
[ 54.978088] I/O error, dev mmcblk0, sector 21479 op 0x1:(WRITE) flags 0x0 phys_seg 11 prio class 0
[ 54.978108] Buffer I/O error on dev mmcblk0p1, logical block 19431, lost async page write
[ 54.978129] Buffer I/O error on dev mmcblk0p1, logical block 19432, lost async page write
[ 54.978134] Buffer I/O error on dev mmcblk0p1, logical block 19433, lost async page write
[ 54.978137] Buffer I/O error on dev mmcblk0p1, logical block 19434, lost async page write
[ 54.978141] Buffer I/O error on dev mmcblk0p1, logical block 19435, lost async page write
[ 54.978145] Buffer I/O error on dev mmcblk0p1, logical block 19436, lost async page write
[ 54.978148] Buffer I/O error on dev mmcblk0p1, logical block 19437, lost async page write
[ 54.978152] Buffer I/O error on dev mmcblk0p1, logical block 19438, lost async page write
[ 54.978155] Buffer I/O error on dev mmcblk0p1, logical block 19439, lost async page write
[ 54.978160] Buffer I/O error on dev mmcblk0p1, logical block 19440, lost async page write
[ 54.978244] mmc0: card aaaa removed
[ 54.978452] FAT-fs (mmcblk0p1): FAT read failed (blocknr 4257)
There's interrupt immediately raised on rtsx_pci_write_register() in
runtime resume routine, but the IRQ handler hasn't registered yet.
So we can either move rtsx_pci_write_register() after rtsx_pci_acquire_irq(),
or just stop mangling IRQ on runtime PM. Choose the latter to save some
CPU cycles.
Fixes: 5b4258f6721f ("misc: rtsx: rts5249 support runtime PM")
Cc: stable <stable(a)vger.kernel.org>
Signed-off-by: Kai-Heng Feng <kai.heng.feng(a)canonical.com>
BugLink: https://bugs.launchpad.net/bugs/1951784
Link: https://lore.kernel.org/r/20211126003246.1068770-1-kai.heng.feng@canonical.…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/misc/cardreader/rtsx_pcr.c | 4 ----
1 file changed, 4 deletions(-)
diff --git a/drivers/misc/cardreader/rtsx_pcr.c b/drivers/misc/cardreader/rtsx_pcr.c
index 8c72eb590f79..6ac509c1821c 100644
--- a/drivers/misc/cardreader/rtsx_pcr.c
+++ b/drivers/misc/cardreader/rtsx_pcr.c
@@ -1803,8 +1803,6 @@ static int rtsx_pci_runtime_suspend(struct device *device)
mutex_lock(&pcr->pcr_mutex);
rtsx_pci_power_off(pcr, HOST_ENTER_S3);
- free_irq(pcr->irq, (void *)pcr);
-
mutex_unlock(&pcr->pcr_mutex);
pcr->is_runtime_suspended = true;
@@ -1825,8 +1823,6 @@ static int rtsx_pci_runtime_resume(struct device *device)
mutex_lock(&pcr->pcr_mutex);
rtsx_pci_write_register(pcr, HOST_SLEEP_STATE, 0x03, 0x00);
- rtsx_pci_acquire_irq(pcr);
- synchronize_irq(pcr->irq);
if (pcr->ops->fetch_vendor_settings)
pcr->ops->fetch_vendor_settings(pcr);
--
2.34.1
This is a note to let you know that I've just added the patch titled
nvmem: eeprom: at25: fix FRAM byte_len
to my char-misc git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
in the char-misc-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From 9a626577398c24ecab63c0a684436c8928092367 Mon Sep 17 00:00:00 2001
From: Ralph Siemsen <ralph.siemsen(a)linaro.org>
Date: Mon, 8 Nov 2021 13:16:27 -0500
Subject: nvmem: eeprom: at25: fix FRAM byte_len
Commit fd307a4ad332 ("nvmem: prepare basics for FRAM support") added
support for FRAM devices such as the Cypress FM25V. During testing, it
was found that the FRAM detects properly, however reads and writes fail.
Upon further investigation, two problem were found in at25_probe() routine.
1) In the case of an FRAM device without platform data, eg.
fram == true && spi->dev.platform_data == NULL
the stack local variable "struct spi_eeprom chip" is not initialized
fully, prior to being copied into at25->chip. The chip.flags field in
particular can cause problems.
2) The byte_len of FRAM is computed from its ID register, and is stored
into the stack local "struct spi_eeprom chip" structure. This happens
after the same structure has been copied into at25->chip. As a result,
at25->chip.byte_len does not contain the correct length of the device.
In turn this can cause checks at beginning of at25_ee_read() to fail
(or equally, it could allow reads beyond the end of the device length).
Fix both of these issues by eliminating the on-stack struct spi_eeprom.
Instead use the one inside at25_data structure, which starts of zeroed.
Fixes: fd307a4ad332 ("nvmem: prepare basics for FRAM support")
Cc: stable <stable(a)vger.kernel.org>
Reviewed-by: Arnd Bergmann <arnd(a)arndb.de>
Signed-off-by: Ralph Siemsen <ralph.siemsen(a)linaro.org>
Link: https://lore.kernel.org/r/20211108181627.645638-1-ralph.siemsen@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/misc/eeprom/at25.c | 38 ++++++++++++++++++--------------------
1 file changed, 18 insertions(+), 20 deletions(-)
diff --git a/drivers/misc/eeprom/at25.c b/drivers/misc/eeprom/at25.c
index 632325474233..b38978a3b3ff 100644
--- a/drivers/misc/eeprom/at25.c
+++ b/drivers/misc/eeprom/at25.c
@@ -376,7 +376,6 @@ MODULE_DEVICE_TABLE(spi, at25_spi_ids);
static int at25_probe(struct spi_device *spi)
{
struct at25_data *at25 = NULL;
- struct spi_eeprom chip;
int err;
int sr;
u8 id[FM25_ID_LEN];
@@ -389,15 +388,18 @@ static int at25_probe(struct spi_device *spi)
if (match && !strcmp(match->compatible, "cypress,fm25"))
is_fram = 1;
+ at25 = devm_kzalloc(&spi->dev, sizeof(struct at25_data), GFP_KERNEL);
+ if (!at25)
+ return -ENOMEM;
+
/* Chip description */
- if (!spi->dev.platform_data) {
- if (!is_fram) {
- err = at25_fw_to_chip(&spi->dev, &chip);
- if (err)
- return err;
- }
- } else
- chip = *(struct spi_eeprom *)spi->dev.platform_data;
+ if (spi->dev.platform_data) {
+ memcpy(&at25->chip, spi->dev.platform_data, sizeof(at25->chip));
+ } else if (!is_fram) {
+ err = at25_fw_to_chip(&spi->dev, &at25->chip);
+ if (err)
+ return err;
+ }
/* Ping the chip ... the status register is pretty portable,
* unlike probing manufacturer IDs. We do expect that system
@@ -409,12 +411,7 @@ static int at25_probe(struct spi_device *spi)
return -ENXIO;
}
- at25 = devm_kzalloc(&spi->dev, sizeof(struct at25_data), GFP_KERNEL);
- if (!at25)
- return -ENOMEM;
-
mutex_init(&at25->lock);
- at25->chip = chip;
at25->spi = spi;
spi_set_drvdata(spi, at25);
@@ -431,7 +428,7 @@ static int at25_probe(struct spi_device *spi)
dev_err(&spi->dev, "Error: unsupported size (id %02x)\n", id[7]);
return -ENODEV;
}
- chip.byte_len = int_pow(2, id[7] - 0x21 + 4) * 1024;
+ at25->chip.byte_len = int_pow(2, id[7] - 0x21 + 4) * 1024;
if (at25->chip.byte_len > 64 * 1024)
at25->chip.flags |= EE_ADDR3;
@@ -464,7 +461,7 @@ static int at25_probe(struct spi_device *spi)
at25->nvmem_config.type = is_fram ? NVMEM_TYPE_FRAM : NVMEM_TYPE_EEPROM;
at25->nvmem_config.name = dev_name(&spi->dev);
at25->nvmem_config.dev = &spi->dev;
- at25->nvmem_config.read_only = chip.flags & EE_READONLY;
+ at25->nvmem_config.read_only = at25->chip.flags & EE_READONLY;
at25->nvmem_config.root_only = true;
at25->nvmem_config.owner = THIS_MODULE;
at25->nvmem_config.compat = true;
@@ -474,17 +471,18 @@ static int at25_probe(struct spi_device *spi)
at25->nvmem_config.priv = at25;
at25->nvmem_config.stride = 1;
at25->nvmem_config.word_size = 1;
- at25->nvmem_config.size = chip.byte_len;
+ at25->nvmem_config.size = at25->chip.byte_len;
at25->nvmem = devm_nvmem_register(&spi->dev, &at25->nvmem_config);
if (IS_ERR(at25->nvmem))
return PTR_ERR(at25->nvmem);
dev_info(&spi->dev, "%d %s %s %s%s, pagesize %u\n",
- (chip.byte_len < 1024) ? chip.byte_len : (chip.byte_len / 1024),
- (chip.byte_len < 1024) ? "Byte" : "KByte",
+ (at25->chip.byte_len < 1024) ?
+ at25->chip.byte_len : (at25->chip.byte_len / 1024),
+ (at25->chip.byte_len < 1024) ? "Byte" : "KByte",
at25->chip.name, is_fram ? "fram" : "eeprom",
- (chip.flags & EE_READONLY) ? " (readonly)" : "",
+ (at25->chip.flags & EE_READONLY) ? " (readonly)" : "",
at25->chip.page_size);
return 0;
}
--
2.34.1
This is a note to let you know that I've just added the patch titled
usb: cdnsp: Fix a NULL pointer dereference in cdnsp_endpoint_init()
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From 37307f7020ab38dde0892a578249bf63d00bca64 Mon Sep 17 00:00:00 2001
From: Zhou Qingyang <zhou1615(a)umn.edu>
Date: Wed, 1 Dec 2021 01:27:00 +0800
Subject: usb: cdnsp: Fix a NULL pointer dereference in cdnsp_endpoint_init()
In cdnsp_endpoint_init(), cdnsp_ring_alloc() is assigned to pep->ring
and there is a dereference of it in cdnsp_endpoint_init(), which could
lead to a NULL pointer dereference on failure of cdnsp_ring_alloc().
Fix this bug by adding a check of pep->ring.
This bug was found by a static analyzer. The analysis employs
differential checking to identify inconsistent security operations
(e.g., checks or kfrees) between two code paths and confirms that the
inconsistent operations are not recovered in the current function or
the callers, so they constitute bugs.
Note that, as a bug found by static analysis, it can be a false
positive or hard to trigger. Multiple researchers have cross-reviewed
the bug.
Builds with CONFIG_USB_CDNSP_GADGET=y show no new warnings,
and our static analyzer no longer warns about this code.
Fixes: 3d82904559f4 ("usb: cdnsp: cdns3 Add main part of Cadence USBSSP DRD Driver")
Cc: stable <stable(a)vger.kernel.org>
Acked-by: Pawel Laszczak <pawell(a)cadence.com>
Acked-by: Peter Chen <peter.chen(a)kernel.org>
Signed-off-by: Zhou Qingyang <zhou1615(a)umn.edu>
Link: https://lore.kernel.org/r/20211130172700.206650-1-zhou1615@umn.edu
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/cdns3/cdnsp-mem.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/usb/cdns3/cdnsp-mem.c b/drivers/usb/cdns3/cdnsp-mem.c
index ad9aee3f1e39..97866bfb2da9 100644
--- a/drivers/usb/cdns3/cdnsp-mem.c
+++ b/drivers/usb/cdns3/cdnsp-mem.c
@@ -987,6 +987,9 @@ int cdnsp_endpoint_init(struct cdnsp_device *pdev,
/* Set up the endpoint ring. */
pep->ring = cdnsp_ring_alloc(pdev, 2, ring_type, max_packet, mem_flags);
+ if (!pep->ring)
+ return -ENOMEM;
+
pep->skip = false;
/* Fill the endpoint context */
--
2.34.1
This is a note to let you know that I've just added the patch titled
usb: cdns3: gadget: fix new urb never complete if ep cancel previous
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From 387c2b6ba197c6df28e75359f7d892f7c8dec204 Mon Sep 17 00:00:00 2001
From: Frank Li <Frank.Li(a)nxp.com>
Date: Tue, 30 Nov 2021 09:42:39 -0600
Subject: usb: cdns3: gadget: fix new urb never complete if ep cancel previous
requests
This issue was found at android12 MTP.
1. MTP submit many out urb request.
2. Cancel left requests (>20) when enough data get from host
3. Send ACK by IN endpoint.
4. MTP submit new out urb request.
5. 4's urb never complete.
TRACE LOG:
MtpServer-2157 [000] d..3 1287.150391: cdns3_ep_dequeue: ep1out: req: 00000000299e6836, req buff 000000009df42287, length: 0/16384 zsi, status: -115, trb: [start:87, end:87: virt addr 0x80004000ffd50420], flags:1 SID: 0
MtpServer-2157 [000] d..3 1287.150410: cdns3_gadget_giveback: ep1out: req: 00000000299e6836, req buff 000000009df42287, length: 0/16384 zsi, status: -104, trb: [start:87, end:87: virt addr 0x80004000ffd50420], flags:0 SID: 0
MtpServer-2157 [000] d..3 1287.150433: cdns3_ep_dequeue: ep1out: req: 0000000080b7bde6, req buff 000000009ed5c556, length: 0/16384 zsi, status: -115, trb: [start:88, end:88: virt addr 0x80004000ffd5042c], flags:1 SID: 0
MtpServer-2157 [000] d..3 1287.150446: cdns3_gadget_giveback: ep1out: req: 0000000080b7bde6, req buff 000000009ed5c556, length: 0/16384 zsi, status: -104, trb: [start:88, end:88: virt addr 0x80004000ffd5042c], flags:0 SID: 0
....
MtpServer-2157 [000] d..1 1293.630410: cdns3_alloc_request: ep1out: req: 00000000afbccb7d, req buff 0000000000000000, length: 0/0 zsi, status: 0, trb: [start:0, end:0: virt addr (null)], flags:0 SID: 0
MtpServer-2157 [000] d..2 1293.630421: cdns3_ep_queue: ep1out: req: 00000000afbccb7d, req buff 00000000871caf90, length: 0/512 zsi, status: -115, trb: [start:0, end:0: virt addr (null)], flags:0 SID: 0
MtpServer-2157 [000] d..2 1293.630445: cdns3_wa1: WA1: ep1out set guard
MtpServer-2157 [000] d..2 1293.630450: cdns3_wa1: WA1: ep1out restore cycle bit
MtpServer-2157 [000] d..2 1293.630453: cdns3_prepare_trb: ep1out: trb 000000007317b3ee, dma buf: 0xffd5bc00, size: 512, burst: 128 ctrl: 0x00000424 (C=0, T=0, ISP, IOC, Normal) SID:0 LAST_SID:0
MtpServer-2157 [000] d..2 1293.630460: cdns3_doorbell_epx: ep1out, ep_trbaddr ffd50414
....
irq/241-5b13000-2154 [000] d..1 1293.680849: cdns3_epx_irq: IRQ for ep1out: 01000408 ISP , ep_traddr: ffd508ac ep_last_sid: 00000000 use_streams: 0
irq/241-5b13000-2154 [000] d..1 1293.680858: cdns3_complete_trb: ep1out: trb 0000000021a11b54, dma buf: 0xffd50420, size: 16384, burst: 128 ctrl: 0x00001810 (C=0, T=0, CHAIN, LINK) SID:0 LAST_SID:0
irq/241-5b13000-2154 [000] d..1 1293.680865: cdns3_request_handled: Req: 00000000afbccb7d not handled, DMA pos: 185, ep deq: 88, ep enq: 185, start trb: 184, end trb: 184
Actually DMA pos already bigger than previous submit request afbccb7d's TRB (184-184). The reason of (not handled) is that deq position is wrong.
The TRB link is below when irq happen.
DEQ LINK LINK LINK LINK LINK .... TRB(afbccb7d):START DMA(EP_TRADDR).
Original code check LINK TRB, but DEQ just move one step.
LINK DEQ LINK LINK LINK LINK .... TRB(afbccb7d):START DMA(EP_TRADDR).
This patch skip all LINK TRB and sync DEQ to trb's start.
LINK LINK LINK LINK LINK .... DEQ = TRB(afbccb7d):START DMA(EP_TRADDR).
Acked-by: Peter Chen <peter.chen(a)kernel.org>
Cc: stable <stable(a)vger.kernel.org>
Signed-off-by: Frank Li <Frank.Li(a)nxp.com>
Signed-off-by: Jun Li <jun.li(a)nxp.com>
Link: https://lore.kernel.org/r/20211130154239.8029-1-Frank.Li@nxp.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/cdns3/cdns3-gadget.c | 20 ++++----------------
1 file changed, 4 insertions(+), 16 deletions(-)
diff --git a/drivers/usb/cdns3/cdns3-gadget.c b/drivers/usb/cdns3/cdns3-gadget.c
index 1f3b4a142212..f9af7ebe003d 100644
--- a/drivers/usb/cdns3/cdns3-gadget.c
+++ b/drivers/usb/cdns3/cdns3-gadget.c
@@ -337,19 +337,6 @@ static void cdns3_ep_inc_deq(struct cdns3_endpoint *priv_ep)
cdns3_ep_inc_trb(&priv_ep->dequeue, &priv_ep->ccs, priv_ep->num_trbs);
}
-static void cdns3_move_deq_to_next_trb(struct cdns3_request *priv_req)
-{
- struct cdns3_endpoint *priv_ep = priv_req->priv_ep;
- int current_trb = priv_req->start_trb;
-
- while (current_trb != priv_req->end_trb) {
- cdns3_ep_inc_deq(priv_ep);
- current_trb = priv_ep->dequeue;
- }
-
- cdns3_ep_inc_deq(priv_ep);
-}
-
/**
* cdns3_allow_enable_l1 - enable/disable permits to transition to L1.
* @priv_dev: Extended gadget object
@@ -1517,10 +1504,11 @@ static void cdns3_transfer_completed(struct cdns3_device *priv_dev,
trb = priv_ep->trb_pool + priv_ep->dequeue;
- /* Request was dequeued and TRB was changed to TRB_LINK. */
- if (TRB_FIELD_TO_TYPE(le32_to_cpu(trb->control)) == TRB_LINK) {
+ /* The TRB was changed as link TRB, and the request was handled at ep_dequeue */
+ while (TRB_FIELD_TO_TYPE(le32_to_cpu(trb->control)) == TRB_LINK) {
trace_cdns3_complete_trb(priv_ep, trb);
- cdns3_move_deq_to_next_trb(priv_req);
+ cdns3_ep_inc_deq(priv_ep);
+ trb = priv_ep->trb_pool + priv_ep->dequeue;
}
if (!request->stream_id) {
--
2.34.1
This is a note to let you know that I've just added the patch titled
USB: NO_LPM quirk Lenovo Powered USB-C Travel Hub
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From d2a004037c3c6afd36d40c384d2905f47cd51c57 Mon Sep 17 00:00:00 2001
From: Ole Ernst <olebowle(a)gmx.com>
Date: Sat, 27 Nov 2021 10:05:45 +0100
Subject: USB: NO_LPM quirk Lenovo Powered USB-C Travel Hub
This is another branded 8153 device that doesn't work well with LPM:
r8152 2-2.1:1.0 enp0s13f0u2u1: Stop submitting intr, status -71
Disable LPM to resolve the issue.
Signed-off-by: Ole Ernst <olebowle(a)gmx.com>
Cc: stable <stable(a)vger.kernel.org>
Link: https://lore.kernel.org/r/20211127090546.52072-1-olebowle@gmx.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/core/quirks.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/usb/core/quirks.c b/drivers/usb/core/quirks.c
index 8239fe7129dd..019351c0b52c 100644
--- a/drivers/usb/core/quirks.c
+++ b/drivers/usb/core/quirks.c
@@ -434,6 +434,9 @@ static const struct usb_device_id usb_quirk_list[] = {
{ USB_DEVICE(0x1532, 0x0116), .driver_info =
USB_QUIRK_LINEAR_UFRAME_INTR_BINTERVAL },
+ /* Lenovo Powered USB-C Travel Hub (4X90S92381, RTL8153 GigE) */
+ { USB_DEVICE(0x17ef, 0x721e), .driver_info = USB_QUIRK_NO_LPM },
+
/* Lenovo ThinkCenter A630Z TI024Gen3 usb-audio */
{ USB_DEVICE(0x17ef, 0xa012), .driver_info =
USB_QUIRK_DISCONNECT_SUSPEND },
--
2.34.1
Add the missing bulk-endpoint max-packet sanity check to probe() to
avoid division by zero in uvc_alloc_urb_buffers() in case a malicious
device has broken descriptors (or when doing descriptor fuzz testing).
Note that USB core will reject URBs submitted for endpoints with zero
wMaxPacketSize but that drivers doing packet-size calculations still
need to handle this (cf. commit 2548288b4fb0 ("USB: Fix: Don't skip
endpoint descriptors with maxpacket=0")).
Fixes: c0efd232929c ("V4L/DVB (8145a): USB Video Class driver")
Cc: stable(a)vger.kernel.org # 2.6.26
Signed-off-by: Johan Hovold <johan(a)kernel.org>
---
drivers/media/usb/uvc/uvc_video.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/drivers/media/usb/uvc/uvc_video.c b/drivers/media/usb/uvc/uvc_video.c
index e16464606b14..85ac5c1081b6 100644
--- a/drivers/media/usb/uvc/uvc_video.c
+++ b/drivers/media/usb/uvc/uvc_video.c
@@ -1958,6 +1958,10 @@ static int uvc_video_start_transfer(struct uvc_streaming *stream,
if (ep == NULL)
return -EIO;
+ /* Reject broken descriptors. */
+ if (usb_endpoint_maxp(&ep->desc) == 0)
+ return -EIO;
+
ret = uvc_init_video_bulk(stream, ep, gfp_flags);
}
--
2.32.0
From: Zhang Changzhong <zhangchangzhong(a)huawei.com>
commit 164051a6ab5445bd97f719f50b16db8b32174269 upstream.
The TP.CM_BAM message must be sent to the global address [1], so add a
check to drop TP.CM_BAM sent to a non-global address.
Without this patch, the receiver will treat the following packets as
normal RTS/CTS transport:
18EC0102#20090002FF002301
18EB0102#0100000000000000
18EB0102#020000FFFFFFFFFF
[1] SAE-J1939-82 2015 A.3.3 Row 1.
Fixes: 9d71dd0c7009 ("can: add support of SAE J1939 protocol")
Link: https://lore.kernel.org/all/1635431907-15617-4-git-send-email-zhangchangzho…
Link: https://lore.kernel.org/all/20211201102549.3079360-1-o.rempel@pengutronix.de
Cc: stable(a)vger.kernel.org
Signed-off-by: Zhang Changzhong <zhangchangzhong(a)huawei.com>
Acked-by: Oleksij Rempel <o.rempel(a)pengutronix.de>
Signed-off-by: Marc Kleine-Budde <mkl(a)pengutronix.de>
---
changes:
- rebase against v5.4.162
net/can/j1939/transport.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/net/can/j1939/transport.c b/net/can/j1939/transport.c
index 811682e06951..22f4b798d385 100644
--- a/net/can/j1939/transport.c
+++ b/net/can/j1939/transport.c
@@ -2004,6 +2004,12 @@ static void j1939_tp_cmd_recv(struct j1939_priv *priv, struct sk_buff *skb)
extd = J1939_ETP;
/* fall through */
case J1939_TP_CMD_BAM: /* fall through */
+ if (cmd == J1939_TP_CMD_BAM && !j1939_cb_is_broadcast(skcb)) {
+ netdev_err_once(priv->ndev, "%s: BAM to unicast (%02x), ignoring!\n",
+ __func__, skcb->addr.sa);
+ return;
+ }
+ fallthrough;
case J1939_TP_CMD_RTS: /* fall through */
if (skcb->addr.type != extd)
return;
--
2.33.0
A number of users have reported that under certain conditions using
the overlay filesystem, copy_file_range() can unexpectedly create a
0-byte file. [0]
This bug can cause significant problems because applications that copy
files expect the target file to match the source immediately after the
copy. After upgrading from Linux 5.4 to Linux 5.10, our Docker-based
CI tests started failing due to this bug, since Ruby's IO.copy_stream
uses this system call. We have worked around the problem by touching
the target file before using it, but this shouldn't be necessary.
Other projects, such as Rust, have added similar workarounds. [1]
As discussed in the linux-fsdevel mailing list [2], the bug appears to
be present in Linux 5.6 to 5.10, but not in Linux 5.11. We should be
able to cherry-pick the following upstream patches to fix this. Could
you cherry-pick them to 5.10.x stable? I've confirmed that these
patches, applied from top to bottom to that branch, pass the
reproduction test [3]:
82a763e61e2b601309d696d4fa514c77d64ee1be
9b91b6b019fda817eb52f728eb9c79b3579760bc
The diffstat:
fs/overlayfs/file.c | 59
+++++++++++++++++++++++++++++++----------------------------
1 file changed, 31 insertions(+), 28 deletions(-)
Note that these patches do not pick cleanly into 5.6.x - 5.9.x stable.
[0] https://github.com/docker/for-linux/issues/1015
[1] https://github.com/rust-lang/rust/blob/342db70ae4ecc3cd17e4fa6497f0a8d9534c…
[2] https://marc.info/?l=linux-fsdevel&m=163847383311699&w=2
[3] https://github.com/docker/for-linux/issues/1015#issuecomment-841915668
From: Joerg Roedel <jroedel(a)suse.de>
The trampoline_pgd only maps the 0xfffffff000000000-0xffffffffffffffff
range of kernel memory (with 4-level paging). This range contains the
kernels text+data+bss mappings and the module mapping space, but not the
direct mapping and the vmalloc area.
This is enough to get an application processors out of real-mode, but
for code that switches back to real-mode the trampoline_pgd is missing
important parts of the address space. For example, consider this code
from arch/x86/kernel/reboot.c, function machine_real_restart() for a
64-bit kernel:
#ifdef CONFIG_X86_32
load_cr3(initial_page_table);
#else
write_cr3(real_mode_header->trampoline_pgd);
/* Exiting long mode will fail if CR4.PCIDE is set. */
if (boot_cpu_has(X86_FEATURE_PCID))
cr4_clear_bits(X86_CR4_PCIDE);
#endif
/* Jump to the identity-mapped low memory code */
#ifdef CONFIG_X86_32
asm volatile("jmpl *%0" : :
"rm" (real_mode_header->machine_real_restart_asm),
"a" (type));
#else
asm volatile("ljmpl *%0" : :
"m" (real_mode_header->machine_real_restart_asm),
"D" (type));
#endif
The code switches to the trampoline_pgd, which unmaps the direct mapping
and also the kernel stack. The call to cr4_clear_bits() will find no
stack and crash the machine. The real_mode_header pointer below points
into the direct mapping, and dereferencing it also causes a crash.
The reason this does not crash always is only that kernel mappings are
global and the CR3 switch does not flush those mappings. But if theses
mappings are not in the TLB already, the above code will crash before it
can jump to the real-mode stub.
Extend the trampoline_pgd to contain all kernel mappings to prevent
these crashes and to make code which runs on this page-table more
robust.
Cc: stable(a)vger.kernel.org
Signed-off-by: Joerg Roedel <jroedel(a)suse.de>
---
arch/x86/realmode/init.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c
index 6d98609387ba..c5e29db02a46 100644
--- a/arch/x86/realmode/init.c
+++ b/arch/x86/realmode/init.c
@@ -98,6 +98,7 @@ static void __init setup_real_mode(void)
#ifdef CONFIG_X86_64
u64 *trampoline_pgd;
u64 efer;
+ int i;
#endif
base = (unsigned char *)real_mode_header;
@@ -154,8 +155,17 @@ static void __init setup_real_mode(void)
trampoline_header->flags = 0;
trampoline_pgd = (u64 *) __va(real_mode_header->trampoline_pgd);
+
+ /* Map the real mode stub as virtual == physical */
trampoline_pgd[0] = trampoline_pgd_entry.pgd;
- trampoline_pgd[511] = init_top_pgt[511].pgd;
+
+ /*
+ * Include the entirety of the kernel mapping into the trampoline
+ * PGD. This way, all mappings present in the normal kernel page
+ * tables are usable while running on trampoline_pgd.
+ */
+ for (i = pgd_index(__PAGE_OFFSET); i < PTRS_PER_PGD; i++)
+ trampoline_pgd[i] = init_top_pgt[i].pgd;
#endif
sme_sev_setup_real_mode(trampoline_header);
--
2.34.0
The 'wake-capable' entry in channel configuration is not set when
parsing the configuration specified by the controller driver. Add
the missing entry to ensure channel is correctly specified as a
'wake-capable' channel.
Cc: stable(a)vger.kernel.org
Fixes: 0cbf260820fa ("bus: mhi: core: Add support for registering MHI controllers")
Signed-off-by: Bhaumik Bhatt <quic_bbhatt(a)quicinc.com>
Reviewed-by: Manivannan Sadhasivam <mani(a)kernel.org>
---
drivers/bus/mhi/core/init.c | 1 +
1 file changed, 1 insertion(+)
v2:
-Update subject as per comments
-Add fixes tag and CC stable
diff --git a/drivers/bus/mhi/core/init.c b/drivers/bus/mhi/core/init.c
index 5aaca6d..f1ec3441 100644
--- a/drivers/bus/mhi/core/init.c
+++ b/drivers/bus/mhi/core/init.c
@@ -788,6 +788,7 @@ static int parse_ch_cfg(struct mhi_controller *mhi_cntrl,
mhi_chan->offload_ch = ch_cfg->offload_channel;
mhi_chan->db_cfg.reset_req = ch_cfg->doorbell_mode_switch;
mhi_chan->pre_alloc = ch_cfg->auto_queue;
+ mhi_chan->wake_capable = ch_cfg->wake_capable;
/*
* If MHI host allocates buffers, then the channel direction
--
2.7.4
From: Zhang Changzhong <zhangchangzhong(a)huawei.com>
commit 164051a6ab5445bd97f719f50b16db8b32174269 upstream.
The TP.CM_BAM message must be sent to the global address [1], so add a
check to drop TP.CM_BAM sent to a non-global address.
Without this patch, the receiver will treat the following packets as
normal RTS/CTS transport:
18EC0102#20090002FF002301
18EB0102#0100000000000000
18EB0102#020000FFFFFFFFFF
[1] SAE-J1939-82 2015 A.3.3 Row 1.
Fixes: 9d71dd0c7009 ("can: add support of SAE J1939 protocol")
Link: https://lore.kernel.org/all/1635431907-15617-4-git-send-email-zhangchangzho…
Cc: stable(a)vger.kernel.org
Signed-off-by: Zhang Changzhong <zhangchangzhong(a)huawei.com>
Acked-by: Oleksij Rempel <o.rempel(a)pengutronix.de>
Signed-off-by: Marc Kleine-Budde <mkl(a)pengutronix.de>
---
changes:
- rebase against v5.10.82
net/can/j1939/transport.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/net/can/j1939/transport.c b/net/can/j1939/transport.c
index fe35fdad35c9..9c39b0f5d6e0 100644
--- a/net/can/j1939/transport.c
+++ b/net/can/j1939/transport.c
@@ -2004,6 +2004,12 @@ static void j1939_tp_cmd_recv(struct j1939_priv *priv, struct sk_buff *skb)
extd = J1939_ETP;
fallthrough;
case J1939_TP_CMD_BAM:
+ if (cmd == J1939_TP_CMD_BAM && !j1939_cb_is_broadcast(skcb)) {
+ netdev_err_once(priv->ndev, "%s: BAM to unicast (%02x), ignoring!\n",
+ __func__, skcb->addr.sa);
+ return;
+ }
+ fallthrough;
case J1939_TP_CMD_RTS: /* fall through */
if (skcb->addr.type != extd)
return;
--
2.30.2
commit 05abc6a5dec2a8c123a50235ecd1ad8d75ffa7b4 upstream
please consider adding the patch to the 5.4's stable release.
(Tested it with patch, where it applies cleanly.)
Reason: The commit message of the patch states that it is:
"Allow the SFP cages to be used with 2W SFP modules."
This was Reported by: 照山周一郎 <teruyama(a)springboard-inc.jp> in his
openwrt github PR [0].
Based on the reporters' e-mails in a related thread on the OpenWrt
mailinglist [1]. This aforementioned patch is required to get the popular
(in Japan) SFP module "NTT OCU" working.
Thanks,
Christian Lamparter
[0] <https://github.com/openwrt/openwrt/pull/4803>
[1] <https://patchwork.ozlabs.org/project/openwrt/patch/1638108130-24432-1-git-s…>
Summary: When attempting to rise or shut down a NIC manually or via
network-manager under 5.15, the machine reboots or freezes.
Occurs with: 5.15.4-051504-generic and earlier 5.15 mainline (
https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.15.4/) as well as
liquorix flavours.
Does not occur with: 5.14 and 5.13 (both with various flavours)
Hi all,
I'm experiencing a severe bug that causes the machine to reboot or
freeze when trying to login and/or rise/shutdown a NIC. Here's a brief
description of scenarios I've tested:
Scenario 1: enp6s0 managed manually using /etc/networking/interfaces,
DHCP
a. Issuing ifdown enp6s0 in terminal will throw
"/etc/resolvconf/update.d/libc: Warning: /etc/resolv.conf is
not a symbolic link to /run/resolvconf/resolv.conf"
and cause the machine to reboot after ~10s of showing a blinking cursor
b. Issuing shutdown -h now or trying to shutdown/reboot machine via
GUI:
shutdown will stop on "stop job is running for ifdown enp6s0" and after
approx. 10..15s the countdown freezes. Repeated ALT-SysReq-REISUB does
not reboot the machine, a hard reset is required.
--
Scenario 2: enp6s0 managed manually using /etc/networking/interfaces,
STATIC
a. Issuing ifdown enp6s0 in terminal will throw
"send_packet: Operation not permitted
dhclient.c:3010: Failed to send 300 byte long packet over
fallback interface."
and cause the machine to reboot after ~10s of blinking cursor.
b. Issuing shutdown -h now or trying to shutdown or reboot machine via
GUI: shutdown will stop on "stop job is running for ifdown enp6s0" and
after approx. 10..15s the countdown freezes. Repeated ALT-SysReq-REISUB
does not reboot the machine, a hard reset is required.
--
Scenario 3: enp6s0 managed by network manager
a. After booting and logging in either via GUI or TTY, the display will
stay blank and only show a blinking cursor and then freeze after
5..10s. ALT-SysReq-REISUB does not reboot the machine, a hard reset is
required.
--
Here's a snippet from the journal for Scenario 1a:
Nov 21 10:39:25 computer sudo[5606]: user : TTY=pts/0 ;
PWD=/home/user ; USER=root ; COMMAND=/usr/sbin/ifdown enp6s0
Nov 21 10:39:25 computer sudo[5606]: pam_unix(sudo:session): session
opened for user root by (uid=0)
-- Reboot --
Nov 21 10:40:14 computer systemd-journald[478]: Journal started
--
I'm running Alder Lake i9 12900K but I have E-cores disabled in BIOS.
Here are some more specs with working kernel:
$ inxi -bxz
System: Kernel: 5.14.0-19.2-liquorix-amd64 x86_64 bits: 64 compiler:
N/A Desktop: Xfce 4.16.3
Distro: Ubuntu 20.04.3 LTS (Focal Fossa)
Machine: Type: Desktop System: ASUS product: N/A v: N/A serial: N/A
Mobo: ASUSTeK model: ROG STRIX Z690-A GAMING WIFI D4 v: Rev
1.xx serial: <filter>
UEFI [Legacy]: American Megatrends v: 0707 date: 11/10/2021
CPU: 8-Core: 12th Gen Intel Core i9-12900K type: MT MCP arch: N/A
speed: 5381 MHz max: 3201 MHz
Graphics: Device-1: NVIDIA vendor: Gigabyte driver: nvidia v: 470.86
bus ID: 01:00.0
Display: server: X.Org 1.20.11 driver: nvidia tty: N/A
Message: Unable to show advanced data. Required tool glxinfo
missing.
Network: Device-1: Intel vendor: ASUSTeK driver: igc v: kernel port:
4000 bus ID: 06:00.0
Please advice how I may assist in debugging!
Thanks.
From: Willy Tarreau <w(a)1wt.eu>
Ammar Faizi reported that our exit code handling is wrong. We truncate
it to the lowest 8 bits but the syscall itself is expected to take a
regular 32-bit signed integer, not an unsigned char. It's the kernel
that later truncates it to the lowest 8 bits. The difference is visible
in strace, where the program below used to show exit(255) instead of
exit(-1):
int main(void)
{
return -1;
}
This patch applies the fix to all archs. x86_64, i386, arm64, armv7 and
mips were all tested and confirmed to work fine now. Risc-v was not
tested but the change is trivial and exactly the same as for other archs.
Reported-by: Ammar Faizi <ammar.faizi(a)students.amikom.ac.id>
Cc: stable(a)vger.kernel.org
Signed-off-by: Willy Tarreau <w(a)1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck(a)kernel.org>
---
tools/include/nolibc/nolibc.h | 13 +++++--------
1 file changed, 5 insertions(+), 8 deletions(-)
diff --git a/tools/include/nolibc/nolibc.h b/tools/include/nolibc/nolibc.h
index 7f300dc379e70..3e2c6f2ed587f 100644
--- a/tools/include/nolibc/nolibc.h
+++ b/tools/include/nolibc/nolibc.h
@@ -414,7 +414,7 @@ asm(".section .text\n"
"xor %ebp, %ebp\n" // zero the stack frame
"and $-16, %rsp\n" // x86 ABI : esp must be 16-byte aligned before call
"call main\n" // main() returns the status code, we'll exit with it.
- "movzb %al, %rdi\n" // retrieve exit code from 8 lower bits
+ "mov %eax, %edi\n" // retrieve exit code (32 bit)
"mov $60, %rax\n" // NR_exit == 60
"syscall\n" // really exit
"hlt\n" // ensure it does not return
@@ -602,9 +602,9 @@ asm(".section .text\n"
"push %ebx\n" // support both regparm and plain stack modes
"push %eax\n"
"call main\n" // main() returns the status code in %eax
- "movzbl %al, %ebx\n" // retrieve exit code from lower 8 bits
- "movl $1, %eax\n" // NR_exit == 1
- "int $0x80\n" // exit now
+ "mov %eax, %ebx\n" // retrieve exit code (32-bit int)
+ "movl $1, %eax\n" // NR_exit == 1
+ "int $0x80\n" // exit now
"hlt\n" // ensure it does not
"");
@@ -788,7 +788,6 @@ asm(".section .text\n"
"and %r3, %r1, $-8\n" // AAPCS : sp must be 8-byte aligned in the
"mov %sp, %r3\n" // callee, an bl doesn't push (lr=pc)
"bl main\n" // main() returns the status code, we'll exit with it.
- "and %r0, %r0, $0xff\n" // limit exit code to 8 bits
"movs r7, $1\n" // NR_exit == 1
"svc $0x00\n"
"");
@@ -985,7 +984,6 @@ asm(".section .text\n"
"add x2, x2, x1\n" // + argv
"and sp, x1, -16\n" // sp must be 16-byte aligned in the callee
"bl main\n" // main() returns the status code, we'll exit with it.
- "and x0, x0, 0xff\n" // limit exit code to 8 bits
"mov x8, 93\n" // NR_exit == 93
"svc #0\n"
"");
@@ -1190,7 +1188,7 @@ asm(".section .text\n"
"addiu $sp,$sp,-16\n" // the callee expects to save a0..a3 there!
"jal main\n" // main() returns the status code, we'll exit with it.
"nop\n" // delayed slot
- "and $a0, $v0, 0xff\n" // limit exit code to 8 bits
+ "move $a0, $v0\n" // retrieve 32-bit exit code from v0
"li $v0, 4001\n" // NR_exit == 4001
"syscall\n"
".end __start\n"
@@ -1388,7 +1386,6 @@ asm(".section .text\n"
"add a2,a2,a1\n" // + argv
"andi sp,a1,-16\n" // sp must be 16-byte aligned
"call main\n" // main() returns the status code, we'll exit with it.
- "andi a0, a0, 0xff\n" // limit exit code to 8 bits
"li a7, 93\n" // NR_exit == 93
"ecall\n"
"");
--
2.31.1.189.g2e36527f23
From: Willy Tarreau <w(a)1wt.eu>
After re-checking in the spec and comparing stack offsets with glibc,
The last pushed argument must be 16-byte aligned (i.e. aligned before the
call) so that in the callee esp+4 is multiple of 16, so the principle is
the 32-bit equivalent to what Ammar fixed for x86_64. It's possible that
32-bit code using SSE2 or MMX could have been affected. In addition the
frame pointer ought to be zero at the deepest level.
Link: https://gitlab.com/x86-psABIs/i386-ABI/-/wikis/Intel386-psABI
Cc: Ammar Faizi <ammar.faizi(a)students.amikom.ac.id>
Cc: stable(a)vger.kernel.org
Signed-off-by: Willy Tarreau <w(a)1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck(a)kernel.org>
---
tools/include/nolibc/nolibc.h | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/tools/include/nolibc/nolibc.h b/tools/include/nolibc/nolibc.h
index 96b6d56acb572..7f300dc379e70 100644
--- a/tools/include/nolibc/nolibc.h
+++ b/tools/include/nolibc/nolibc.h
@@ -583,13 +583,21 @@ struct sys_stat_struct {
})
/* startup code */
+/*
+ * i386 System V ABI mandates:
+ * 1) last pushed argument must be 16-byte aligned.
+ * 2) The deepest stack frame should be set to zero
+ *
+ */
asm(".section .text\n"
".global _start\n"
"_start:\n"
"pop %eax\n" // argc (first arg, %eax)
"mov %esp, %ebx\n" // argv[] (second arg, %ebx)
"lea 4(%ebx,%eax,4),%ecx\n" // then a NULL then envp (third arg, %ecx)
- "and $-16, %esp\n" // x86 ABI : esp must be 16-byte aligned when
+ "xor %ebp, %ebp\n" // zero the stack frame
+ "and $-16, %esp\n" // x86 ABI : esp must be 16-byte aligned before
+ "sub $4, %esp\n" // the call instruction (args are aligned)
"push %ecx\n" // push all registers on the stack so that we
"push %ebx\n" // support both regparm and plain stack modes
"push %eax\n"
--
2.31.1.189.g2e36527f23
From: Ammar Faizi <ammar.faizi(a)students.amikom.ac.id>
Before this patch, the `_start` function looks like this:
```
0000000000001170 <_start>:
1170: pop %rdi
1171: mov %rsp,%rsi
1174: lea 0x8(%rsi,%rdi,8),%rdx
1179: and $0xfffffffffffffff0,%rsp
117d: sub $0x8,%rsp
1181: call 1000 <main>
1186: movzbq %al,%rdi
118a: mov $0x3c,%rax
1191: syscall
1193: hlt
1194: data16 cs nopw 0x0(%rax,%rax,1)
119f: nop
```
Note the "and" to %rsp with $-16, it makes the %rsp be 16-byte aligned,
but then there is a "sub" with $0x8 which makes the %rsp no longer
16-byte aligned, then it calls main. That's the bug!
What actually the x86-64 System V ABI mandates is that right before the
"call", the %rsp must be 16-byte aligned, not after the "call". So the
"sub" with $0x8 here breaks the alignment. Remove it.
An example where this rule matters is when the callee needs to align
its stack at 16-byte for aligned move instruction, like `movdqa` and
`movaps`. If the callee can't align its stack properly, it will result
in segmentation fault.
x86-64 System V ABI also mandates the deepest stack frame should be
zero. Just to be safe, let's zero the %rbp on startup as the content
of %rbp may be unspecified when the program starts. Now it looks like
this:
```
0000000000001170 <_start>:
1170: pop %rdi
1171: mov %rsp,%rsi
1174: lea 0x8(%rsi,%rdi,8),%rdx
1179: xor %ebp,%ebp # zero the %rbp
117b: and $0xfffffffffffffff0,%rsp # align the %rsp
117f: call 1000 <main>
1184: movzbq %al,%rdi
1188: mov $0x3c,%rax
118f: syscall
1191: hlt
1192: data16 cs nopw 0x0(%rax,%rax,1)
119d: nopl (%rax)
```
Cc: Bedirhan KURT <windowz414(a)gnuweeb.org>
Cc: Louvian Lyndal <louvianlyndal(a)gmail.com>
Reported-by: Peter Cordes <peter(a)cordes.ca>
Signed-off-by: Ammar Faizi <ammar.faizi(a)students.amikom.ac.id>
[wt: I did this on purpose due to a misunderstanding of the spec, other
archs will thus have to be rechecked, particularly i386]
Cc: stable(a)vger.kernel.org
Signed-off-by: Willy Tarreau <w(a)1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck(a)kernel.org>
---
tools/include/nolibc/nolibc.h | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/tools/include/nolibc/nolibc.h b/tools/include/nolibc/nolibc.h
index 3430667b0d241..96b6d56acb572 100644
--- a/tools/include/nolibc/nolibc.h
+++ b/tools/include/nolibc/nolibc.h
@@ -399,14 +399,20 @@ struct stat {
})
/* startup code */
+/*
+ * x86-64 System V ABI mandates:
+ * 1) %rsp must be 16-byte aligned right before the function call.
+ * 2) The deepest stack frame should be zero (the %rbp).
+ *
+ */
asm(".section .text\n"
".global _start\n"
"_start:\n"
"pop %rdi\n" // argc (first arg, %rdi)
"mov %rsp, %rsi\n" // argv[] (second arg, %rsi)
"lea 8(%rsi,%rdi,8),%rdx\n" // then a NULL then envp (third arg, %rdx)
- "and $-16, %rsp\n" // x86 ABI : esp must be 16-byte aligned when
- "sub $8, %rsp\n" // entering the callee
+ "xor %ebp, %ebp\n" // zero the stack frame
+ "and $-16, %rsp\n" // x86 ABI : esp must be 16-byte aligned before call
"call main\n" // main() returns the status code, we'll exit with it.
"movzb %al, %rdi\n" // retrieve exit code from 8 lower bits
"mov $60, %rax\n" // NR_exit == 60
--
2.31.1.189.g2e36527f23
The following commit has been merged into the x86/urgent branch of tip:
Commit-ID: c7719e79347803b8e3b6b50da8c6db410a3012b5
Gitweb: https://git.kernel.org/tip/c7719e79347803b8e3b6b50da8c6db410a3012b5
Author: Feng Tang <feng.tang(a)intel.com>
AuthorDate: Wed, 17 Nov 2021 10:37:50 +08:00
Committer: Thomas Gleixner <tglx(a)linutronix.de>
CommitterDate: Thu, 02 Dec 2021 00:40:35 +01:00
x86/tsc: Add a timer to make sure TSC_adjust is always checked
The TSC_ADJUST register is checked every time a CPU enters idle state, but
Thomas Gleixner mentioned there is still a caveat that a system won't enter
idle [1], either because it's too busy or configured purposely to not enter
idle.
Setup a periodic timer (every 10 minutes) to make sure the check is
happening on a regular base.
[1] https://lore.kernel.org/lkml/875z286xtk.fsf@nanos.tec.linutronix.de/
Fixes: 6e3cd95234dc ("x86/hpet: Use another crystalball to evaluate HPET usability")
Requested-by: Thomas Gleixner <tglx(a)linutronix.de>
Signed-off-by: Feng Tang <feng.tang(a)intel.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: "Paul E. McKenney" <paulmck(a)kernel.org>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20211117023751.24190-1-feng.tang@intel.com
---
arch/x86/kernel/tsc_sync.c | 41 +++++++++++++++++++++++++++++++++++++-
1 file changed, 41 insertions(+)
diff --git a/arch/x86/kernel/tsc_sync.c b/arch/x86/kernel/tsc_sync.c
index 50a4515..9452dc9 100644
--- a/arch/x86/kernel/tsc_sync.c
+++ b/arch/x86/kernel/tsc_sync.c
@@ -30,6 +30,7 @@ struct tsc_adjust {
};
static DEFINE_PER_CPU(struct tsc_adjust, tsc_adjust);
+static struct timer_list tsc_sync_check_timer;
/*
* TSC's on different sockets may be reset asynchronously.
@@ -77,6 +78,46 @@ void tsc_verify_tsc_adjust(bool resume)
}
}
+/*
+ * Normally the tsc_sync will be checked every time system enters idle
+ * state, but there is still caveat that a system won't enter idle,
+ * either because it's too busy or configured purposely to not enter
+ * idle.
+ *
+ * So setup a periodic timer (every 10 minutes) to make sure the check
+ * is always on.
+ */
+
+#define SYNC_CHECK_INTERVAL (HZ * 600)
+
+static void tsc_sync_check_timer_fn(struct timer_list *unused)
+{
+ int next_cpu;
+
+ tsc_verify_tsc_adjust(false);
+
+ /* Run the check for all onlined CPUs in turn */
+ next_cpu = cpumask_next(raw_smp_processor_id(), cpu_online_mask);
+ if (next_cpu >= nr_cpu_ids)
+ next_cpu = cpumask_first(cpu_online_mask);
+
+ tsc_sync_check_timer.expires += SYNC_CHECK_INTERVAL;
+ add_timer_on(&tsc_sync_check_timer, next_cpu);
+}
+
+static int __init start_sync_check_timer(void)
+{
+ if (!cpu_feature_enabled(X86_FEATURE_TSC_ADJUST) || tsc_clocksource_reliable)
+ return 0;
+
+ timer_setup(&tsc_sync_check_timer, tsc_sync_check_timer_fn, 0);
+ tsc_sync_check_timer.expires = jiffies + SYNC_CHECK_INTERVAL;
+ add_timer(&tsc_sync_check_timer);
+
+ return 0;
+}
+late_initcall(start_sync_check_timer);
+
static void tsc_sanitize_first_cpu(struct tsc_adjust *cur, s64 bootval,
unsigned int cpu, bool bootcpu)
{
The following commit has been merged into the x86/urgent branch of tip:
Commit-ID: b50db7095fe002fa3e16605546cba66bf1b68a3e
Gitweb: https://git.kernel.org/tip/b50db7095fe002fa3e16605546cba66bf1b68a3e
Author: Feng Tang <feng.tang(a)intel.com>
AuthorDate: Wed, 17 Nov 2021 10:37:51 +08:00
Committer: Thomas Gleixner <tglx(a)linutronix.de>
CommitterDate: Thu, 02 Dec 2021 00:40:36 +01:00
x86/tsc: Disable clocksource watchdog for TSC on qualified platorms
There are cases that the TSC clocksource is wrongly judged as unstable by
the clocksource watchdog mechanism which tries to validate the TSC against
HPET, PM_TIMER or jiffies. While there is hardly a general reliable way to
check the validity of a watchdog, Thomas Gleixner proposed [1]:
"I'm inclined to lift that requirement when the CPU has:
1) X86_FEATURE_CONSTANT_TSC
2) X86_FEATURE_NONSTOP_TSC
3) X86_FEATURE_NONSTOP_TSC_S3
4) X86_FEATURE_TSC_ADJUST
5) At max. 4 sockets
After two decades of horrors we're finally at a point where TSC seems
to be halfway reliable and less abused by BIOS tinkerers. TSC_ADJUST
was really key as we can now detect even small modifications reliably
and the important point is that we can cure them as well (not pretty
but better than all other options)."
As feature #3 X86_FEATURE_NONSTOP_TSC_S3 only exists on several generations
of Atom processorz, and is always coupled with X86_FEATURE_CONSTANT_TSC
and X86_FEATURE_NONSTOP_TSC, skip checking it, and also be more defensive
to use maximal 2 sockets.
The check is done inside tsc_init() before registering 'tsc-early' and
'tsc' clocksources, as there were cases that both of them had been
wrongly judged as unreliable.
For more background of tsc/watchdog, there is a good summary in [2]
[tglx} Update vs. jiffies:
On systems where the only remaining clocksource aside of TSC is jiffies
there is no way to make this work because that creates a circular
dependency. Jiffies accuracy depends on not missing a periodic timer
interrupt, which is not guaranteed. That could be detected by TSC, but as
TSC is not trusted this cannot be compensated. The consequence is a
circulus vitiosus which results in shutting down TSC and falling back to
the jiffies clocksource which is even more unreliable.
[1]. https://lore.kernel.org/lkml/87eekfk8bd.fsf@nanos.tec.linutronix.de/
[2]. https://lore.kernel.org/lkml/87a6pimt1f.ffs@nanos.tec.linutronix.de/
[ tglx: Refine comment and amend changelog ]
Fixes: 6e3cd95234dc ("x86/hpet: Use another crystalball to evaluate HPET usability")
Suggested-by: Thomas Gleixner <tglx(a)linutronix.de>
Signed-off-by: Feng Tang <feng.tang(a)intel.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: "Paul E. McKenney" <paulmck(a)kernel.org>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20211117023751.24190-2-feng.tang@intel.com
---
arch/x86/kernel/tsc.c | 28 ++++++++++++++++++++++++----
1 file changed, 24 insertions(+), 4 deletions(-)
diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 2e076a4..a698196 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -1180,6 +1180,12 @@ void mark_tsc_unstable(char *reason)
EXPORT_SYMBOL_GPL(mark_tsc_unstable);
+static void __init tsc_disable_clocksource_watchdog(void)
+{
+ clocksource_tsc_early.flags &= ~CLOCK_SOURCE_MUST_VERIFY;
+ clocksource_tsc.flags &= ~CLOCK_SOURCE_MUST_VERIFY;
+}
+
static void __init check_system_tsc_reliable(void)
{
#if defined(CONFIG_MGEODEGX1) || defined(CONFIG_MGEODE_LX) || defined(CONFIG_X86_GENERIC)
@@ -1196,6 +1202,23 @@ static void __init check_system_tsc_reliable(void)
#endif
if (boot_cpu_has(X86_FEATURE_TSC_RELIABLE))
tsc_clocksource_reliable = 1;
+
+ /*
+ * Disable the clocksource watchdog when the system has:
+ * - TSC running at constant frequency
+ * - TSC which does not stop in C-States
+ * - the TSC_ADJUST register which allows to detect even minimal
+ * modifications
+ * - not more than two sockets. As the number of sockets cannot be
+ * evaluated at the early boot stage where this has to be
+ * invoked, check the number of online memory nodes as a
+ * fallback solution which is an reasonable estimate.
+ */
+ if (boot_cpu_has(X86_FEATURE_CONSTANT_TSC) &&
+ boot_cpu_has(X86_FEATURE_NONSTOP_TSC) &&
+ boot_cpu_has(X86_FEATURE_TSC_ADJUST) &&
+ nr_online_nodes <= 2)
+ tsc_disable_clocksource_watchdog();
}
/*
@@ -1387,9 +1410,6 @@ static int __init init_tsc_clocksource(void)
if (tsc_unstable)
goto unreg;
- if (tsc_clocksource_reliable || no_tsc_watchdog)
- clocksource_tsc.flags &= ~CLOCK_SOURCE_MUST_VERIFY;
-
if (boot_cpu_has(X86_FEATURE_NONSTOP_TSC_S3))
clocksource_tsc.flags |= CLOCK_SOURCE_SUSPEND_NONSTOP;
@@ -1527,7 +1547,7 @@ void __init tsc_init(void)
}
if (tsc_clocksource_reliable || no_tsc_watchdog)
- clocksource_tsc_early.flags &= ~CLOCK_SOURCE_MUST_VERIFY;
+ tsc_disable_clocksource_watchdog();
clocksource_register_khz(&clocksource_tsc_early, tsc_khz);
detect_art();
Check out this report and any autotriaged failures in our web dashboard:
https://datawarehouse.cki-project.org/kcidb/checkouts/25818
Hello,
We ran automated tests on a recent commit from this kernel tree:
Kernel repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
Commit: 0b841dcc6aa5 - drm/amdgpu/gfx9: switch to golden tsc registers for renoir+
The results of these automated tests are provided below.
Overall result: PASSED
Merge: OK
Compile: OK
Tests: OK
Targeted tests: NO
All kernel binaries, config files, and logs are available for download here:
https://arr-cki-prod-datawarehouse-public.s3.amazonaws.com/index.html?prefi…
Please reply to this email if you have any questions about the tests that we
ran or if you have any suggestions on how to make future tests more effective.
,-. ,-.
( C ) ( K ) Continuous
`-',-.`-' Kernel
( I ) Integration
`-'
______________________________________________________________________________
Compile testing
---------------
We compiled the kernel for 4 architectures:
aarch64:
make options: make -j24 INSTALL_MOD_STRIP=1 targz-pkg
ppc64le:
make options: make -j24 INSTALL_MOD_STRIP=1 targz-pkg
s390x:
make options: make -j24 INSTALL_MOD_STRIP=1 targz-pkg
x86_64:
make options: make -j24 INSTALL_MOD_STRIP=1 targz-pkg
Hardware testing
----------------
We booted each kernel and ran the following tests:
aarch64:
Host 1:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ Reboot test
⚡⚡⚡ Networking bridge: sanity - mlx5
⚡⚡⚡ Ethernet drivers sanity - mlx5
Host 2:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
✅ Boot test
✅ Reboot test
🚧 ⚡⚡⚡ Storage blktests - nvmeof-mp
Host 3:
✅ Boot test
✅ Reboot test
✅ ACPI table test
✅ ACPI enabled test
✅ LTP - cve
✅ LTP - sched
✅ LTP - syscalls
✅ LTP - can
✅ LTP - commands
✅ LTP - containers
✅ LTP - dio
✅ LTP - fs
✅ LTP - fsx
✅ LTP - math
✅ LTP - hugetlb
✅ LTP - mm
✅ LTP - nptl
✅ LTP - pty
✅ LTP - ipc
✅ LTP - tracing
✅ LTP: openposix test suite
✅ CIFS Connectathon
✅ POSIX pjd-fstest suites
✅ NFS Connectathon
✅ Loopdev Sanity
✅ jvm - jcstress tests
✅ Memory: fork_mem
✅ Memory function: memfd_create
✅ AMTU (Abstract Machine Test Utility)
✅ Networking bridge: sanity
✅ Ethernet drivers sanity
✅ Networking socket: fuzz
✅ Networking route: pmtu
✅ Networking route_func - local
✅ Networking route_func - forward
✅ Networking TCP: keepalive test
✅ Networking UDP: socket
✅ Networking cki netfilter test
✅ Networking tunnel: geneve basic test
✅ Networking tunnel: gre basic
✅ L2TP basic test
✅ Networking tunnel: vxlan basic
✅ Networking ipsec: basic netns - transport
✅ Networking ipsec: basic netns - tunnel
✅ Libkcapi AF_ALG test
✅ pciutils: update pci ids test
✅ ALSA PCM loopback test
✅ ALSA Control (mixer) Userspace Element test
✅ storage: dm/common
✅ lvm snapper test
✅ storage: SCSI VPD
✅ trace: ftrace/tracer
🚧 ✅ xarray-idr-radixtree-test
🚧 ✅ i2c: i2cdetect sanity
🚧 ✅ Firmware test suite
🚧 ✅ Memory function: kaslr
🚧 ✅ Networking: igmp conformance test
🚧 ✅ audit: audit testsuite test
Host 4:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ Reboot test
🚧 ⚡⚡⚡ Storage blktests - srp
Host 5:
✅ Boot test
✅ Reboot test
✅ xfstests - ext4
✅ xfstests - xfs
✅ IPMI driver test
✅ IPMItool loop stress test
✅ selinux-policy: serge-testsuite
✅ Storage blktests - blk
✅ Storage block - filesystem fio test
✅ Storage block - queue scheduler test
✅ storage: software RAID testing
✅ Storage: swraid mdadm raid_module test
✅ stress: stress-ng - interrupt
✅ stress: stress-ng - cpu
✅ stress: stress-ng - cpu-cache
✅ stress: stress-ng - memory
🚧 ✅ Podman system test - as root
🚧 ❌ Podman system test - as user
🚧 ✅ xfstests - btrfs
🚧 ✅ Storage blktests - nvme-tcp
🚧 ✅ lvm cache test
🚧 ✅ stress: stress-ng - os
Host 6:
✅ Boot test
✅ Reboot test
✅ Networking bridge: sanity - mlx5
✅ Ethernet drivers sanity - mlx5
Host 7:
✅ Boot test
✅ Reboot test
🚧 ✅ Storage blktests - srp
ppc64le:
Host 1:
✅ Boot test
✅ Reboot test
✅ xfstests - ext4
✅ xfstests - xfs
✅ IPMI driver test
✅ IPMItool loop stress test
✅ selinux-policy: serge-testsuite
✅ Storage blktests - blk
✅ Storage block - filesystem fio test
✅ Storage block - queue scheduler test
✅ storage: software RAID testing
✅ Storage: swraid mdadm raid_module test
🚧 ✅ Podman system test - as root
🚧 ✅ Podman system test - as user
🚧 ✅ xfstests - btrfs
🚧 ✅ Storage blktests - nvme-tcp
🚧 ✅ Storage: lvm device-mapper test - upstream
🚧 ✅ lvm cache test
Host 2:
✅ Boot test
✅ Reboot test
🚧 ❌ Storage blktests - srp
Host 3:
✅ Boot test
✅ Reboot test
✅ LTP - cve
✅ LTP - sched
✅ LTP - syscalls
✅ LTP - can
✅ LTP - commands
✅ LTP - containers
✅ LTP - dio
✅ LTP - fs
✅ LTP - fsx
✅ LTP - math
✅ LTP - hugetlb
✅ LTP - mm
✅ LTP - nptl
✅ LTP - pty
✅ LTP - ipc
✅ LTP - tracing
✅ LTP: openposix test suite
✅ CIFS Connectathon
✅ POSIX pjd-fstest suites
✅ NFS Connectathon
✅ Loopdev Sanity
✅ jvm - jcstress tests
✅ Memory: fork_mem
✅ Memory function: memfd_create
✅ AMTU (Abstract Machine Test Utility)
✅ Networking bridge: sanity
✅ Ethernet drivers sanity
✅ Networking socket: fuzz
✅ Networking route: pmtu
✅ Networking route_func - local
✅ Networking route_func - forward
✅ Networking TCP: keepalive test
✅ Networking UDP: socket
✅ Networking cki netfilter test
✅ Networking tunnel: geneve basic test
✅ Networking tunnel: gre basic
✅ L2TP basic test
✅ Networking tunnel: vxlan basic
✅ Networking ipsec: basic netns - tunnel
✅ Libkcapi AF_ALG test
✅ pciutils: update pci ids test
✅ ALSA PCM loopback test
✅ ALSA Control (mixer) Userspace Element test
✅ storage: dm/common
✅ lvm snapper test
✅ trace: ftrace/tracer
🚧 ✅ xarray-idr-radixtree-test
🚧 ✅ Memory function: kaslr
🚧 ✅ audit: audit testsuite test
Host 4:
✅ Boot test
✅ Reboot test
🚧 ✅ Storage blktests - nvmeof-mp
s390x:
Host 1:
✅ Boot test
✅ Reboot test
🚧 ✅ Storage blktests - nvmeof-mp
Host 2:
✅ Boot test
✅ Reboot test
✅ LTP - cve
✅ LTP - sched
✅ LTP - syscalls
✅ LTP - can
✅ LTP - commands
✅ LTP - containers
✅ LTP - dio
✅ LTP - fs
✅ LTP - fsx
✅ LTP - math
✅ LTP - hugetlb
✅ LTP - mm
✅ LTP - nptl
✅ LTP - pty
✅ LTP - ipc
✅ LTP - tracing
✅ LTP: openposix test suite
✅ CIFS Connectathon
✅ POSIX pjd-fstest suites
✅ NFS Connectathon
✅ Loopdev Sanity
✅ jvm - jcstress tests
✅ Memory: fork_mem
✅ Memory function: memfd_create
✅ AMTU (Abstract Machine Test Utility)
✅ Networking bridge: sanity
✅ Ethernet drivers sanity
✅ Networking route: pmtu
✅ Networking route_func - local
✅ Networking route_func - forward
✅ Networking TCP: keepalive test
✅ Networking UDP: socket
✅ Networking cki netfilter test
✅ Networking tunnel: geneve basic test
✅ Networking tunnel: gre basic
✅ L2TP basic test
✅ Networking tunnel: vxlan basic
✅ Networking ipsec: basic netns - transport
✅ Networking ipsec: basic netns - tunnel
✅ Libkcapi AF_ALG test
✅ storage: dm/common
✅ lvm snapper test
✅ trace: ftrace/tracer
🚧 ❌ xarray-idr-radixtree-test
🚧 ✅ Memory function: kaslr
🚧 ✅ audit: audit testsuite test
Host 3:
✅ Boot test
✅ Reboot test
✅ selinux-policy: serge-testsuite
✅ Storage blktests - blk
✅ Storage: swraid mdadm raid_module test
✅ stress: stress-ng - interrupt
✅ stress: stress-ng - cpu
✅ stress: stress-ng - cpu-cache
✅ stress: stress-ng - memory
🚧 ✅ Podman system test - as root
🚧 ✅ Podman system test - as user
🚧 ✅ Storage blktests - nvme-tcp
🚧 ✅ lvm cache test
🚧 ✅ stress: stress-ng - os
Host 4:
✅ Boot test
✅ Reboot test
🚧 ✅ Storage blktests - srp
x86_64:
Host 1:
✅ Boot test
✅ Reboot test
🚧 ✅ Storage blktests - nvmeof-mp
Host 2:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ Reboot test
🚧 ⚡⚡⚡ Storage blktests - srp
Host 3:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
✅ Boot test
✅ Reboot test
✅ ACPI table test
✅ LTP - cve
✅ LTP - sched
✅ LTP - syscalls
✅ LTP - can
✅ LTP - commands
⚡⚡⚡ LTP - containers
⚡⚡⚡ LTP - dio
⚡⚡⚡ LTP - fs
⚡⚡⚡ LTP - fsx
⚡⚡⚡ LTP - math
⚡⚡⚡ LTP - hugetlb
⚡⚡⚡ LTP - mm
⚡⚡⚡ LTP - nptl
⚡⚡⚡ LTP - pty
⚡⚡⚡ LTP - ipc
⚡⚡⚡ LTP - tracing
⚡⚡⚡ LTP: openposix test suite
⚡⚡⚡ CIFS Connectathon
⚡⚡⚡ POSIX pjd-fstest suites
⚡⚡⚡ NFS Connectathon
⚡⚡⚡ Loopdev Sanity
⚡⚡⚡ jvm - jcstress tests
⚡⚡⚡ Memory: fork_mem
⚡⚡⚡ Memory function: memfd_create
⚡⚡⚡ AMTU (Abstract Machine Test Utility)
⚡⚡⚡ Networking bridge: sanity
⚡⚡⚡ Ethernet drivers sanity
⚡⚡⚡ Networking socket: fuzz
⚡⚡⚡ Networking route: pmtu
⚡⚡⚡ Networking route_func - local
⚡⚡⚡ Networking route_func - forward
⚡⚡⚡ Networking TCP: keepalive test
⚡⚡⚡ Networking UDP: socket
⚡⚡⚡ Networking cki netfilter test
⚡⚡⚡ Networking tunnel: geneve basic test
⚡⚡⚡ Networking tunnel: gre basic
⚡⚡⚡ L2TP basic test
⚡⚡⚡ Networking tunnel: vxlan basic
⚡⚡⚡ Networking ipsec: basic netns - transport
⚡⚡⚡ Networking ipsec: basic netns - tunnel
⚡⚡⚡ Libkcapi AF_ALG test
⚡⚡⚡ pciutils: sanity smoke test
⚡⚡⚡ pciutils: update pci ids test
⚡⚡⚡ ALSA PCM loopback test
⚡⚡⚡ ALSA Control (mixer) Userspace Element test
⚡⚡⚡ storage: dm/common
⚡⚡⚡ lvm snapper test
⚡⚡⚡ storage: SCSI VPD
⚡⚡⚡ trace: ftrace/tracer
🚧 ⚡⚡⚡ xarray-idr-radixtree-test
🚧 ⚡⚡⚡ i2c: i2cdetect sanity
🚧 ⚡⚡⚡ Firmware test suite
🚧 ⚡⚡⚡ Memory function: kaslr
🚧 ⚡⚡⚡ Networking: igmp conformance test
🚧 ⚡⚡⚡ audit: audit testsuite test
Host 4:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ Reboot test
⚡⚡⚡ xfstests - ext4
⚡⚡⚡ xfstests - xfs
⚡⚡⚡ xfstests - nfsv4.2
⚡⚡⚡ xfstests - cifsv3.11
⚡⚡⚡ IPMI driver test
⚡⚡⚡ IPMItool loop stress test
⚡⚡⚡ selinux-policy: serge-testsuite
⚡⚡⚡ power-management: cpupower/sanity test
⚡⚡⚡ Storage blktests - blk
⚡⚡⚡ Storage block - filesystem fio test
⚡⚡⚡ Storage block - queue scheduler test
⚡⚡⚡ storage: software RAID testing
⚡⚡⚡ Storage: swraid mdadm raid_module test
⚡⚡⚡ stress: stress-ng - interrupt
⚡⚡⚡ stress: stress-ng - cpu
⚡⚡⚡ stress: stress-ng - cpu-cache
⚡⚡⚡ stress: stress-ng - memory
🚧 ⚡⚡⚡ Podman system test - as root
🚧 ⚡⚡⚡ Podman system test - as user
🚧 ⚡⚡⚡ CPU: Idle Test
🚧 ⚡⚡⚡ xfstests - btrfs
🚧 ⚡⚡⚡ Storage blktests - nvme-tcp
🚧 ⚡⚡⚡ Storage: lvm device-mapper test - upstream
🚧 ⚡⚡⚡ lvm cache test
🚧 ⚡⚡⚡ stress: stress-ng - os
Host 5:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ Reboot test
⚡⚡⚡ xfstests - ext4
⚡⚡⚡ xfstests - xfs
⚡⚡⚡ xfstests - nfsv4.2
⚡⚡⚡ xfstests - cifsv3.11
⚡⚡⚡ IPMI driver test
⚡⚡⚡ IPMItool loop stress test
⚡⚡⚡ selinux-policy: serge-testsuite
⚡⚡⚡ power-management: cpupower/sanity test
⚡⚡⚡ Storage blktests - blk
⚡⚡⚡ Storage block - filesystem fio test
⚡⚡⚡ Storage block - queue scheduler test
⚡⚡⚡ storage: software RAID testing
⚡⚡⚡ Storage: swraid mdadm raid_module test
⚡⚡⚡ stress: stress-ng - interrupt
⚡⚡⚡ stress: stress-ng - cpu
⚡⚡⚡ stress: stress-ng - cpu-cache
⚡⚡⚡ stress: stress-ng - memory
🚧 ⚡⚡⚡ Podman system test - as root
🚧 ⚡⚡⚡ Podman system test - as user
🚧 ⚡⚡⚡ CPU: Idle Test
🚧 ⚡⚡⚡ xfstests - btrfs
🚧 ⚡⚡⚡ Storage blktests - nvme-tcp
🚧 ⚡⚡⚡ Storage: lvm device-mapper test - upstream
🚧 ⚡⚡⚡ lvm cache test
🚧 ⚡⚡⚡ stress: stress-ng - os
Host 6:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ Reboot test
⚡⚡⚡ xfstests - ext4
⚡⚡⚡ xfstests - xfs
⚡⚡⚡ xfstests - nfsv4.2
⚡⚡⚡ xfstests - cifsv3.11
⚡⚡⚡ IPMI driver test
⚡⚡⚡ IPMItool loop stress test
⚡⚡⚡ selinux-policy: serge-testsuite
⚡⚡⚡ power-management: cpupower/sanity test
⚡⚡⚡ Storage blktests - blk
⚡⚡⚡ Storage block - filesystem fio test
⚡⚡⚡ Storage block - queue scheduler test
⚡⚡⚡ storage: software RAID testing
⚡⚡⚡ Storage: swraid mdadm raid_module test
⚡⚡⚡ stress: stress-ng - interrupt
⚡⚡⚡ stress: stress-ng - cpu
⚡⚡⚡ stress: stress-ng - cpu-cache
⚡⚡⚡ stress: stress-ng - memory
🚧 ⚡⚡⚡ Podman system test - as root
🚧 ⚡⚡⚡ Podman system test - as user
🚧 ⚡⚡⚡ CPU: Idle Test
🚧 ⚡⚡⚡ xfstests - btrfs
🚧 ⚡⚡⚡ Storage blktests - nvme-tcp
🚧 ⚡⚡⚡ Storage: lvm device-mapper test - upstream
🚧 ⚡⚡⚡ lvm cache test
🚧 ⚡⚡⚡ stress: stress-ng - os
Test sources: https://gitlab.com/cki-project/kernel-tests
💚 Pull requests are welcome for new tests or improvements to existing tests!
Aborted tests
-------------
Tests that didn't complete running successfully are marked with ⚡⚡⚡.
If this was caused by an infrastructure issue, we try to mark that
explicitly in the report.
Waived tests
------------
If the test run included waived tests, they are marked with 🚧. Such tests are
executed but their results are not taken into account. Tests are waived when
their results are not reliable enough, e.g. when they're just introduced or are
being fixed.
Testing timeout
---------------
We aim to provide a report within reasonable timeframe. Tests that haven't
finished running yet are marked with ⏱.
Targeted tests
--------------
Test runs for patches always include a set of base tests, plus some
tests chosen based on the file paths modified by the patch. The latter
are called "targeted tests". If no targeted tests are run, that means
no patch-specific tests are available. Please, consider contributing a
targeted test for related patches to increase test coverage. See
https://docs.engineering.redhat.com/x/_wEZB for more details.
An initialised kobject must be freed using kobject_put() to avoid
leaking associated resources (e.g. the object name).
Commit fe3c60684377 ("firmware: Fix a reference count leak.") "fixed"
the leak in the first error path of the file registration helper but
left the second one unchanged. This "fix" would however result in a NULL
pointer dereference due to the release function also removing the never
added entry from the fw_cfg_entry_cache list. This has now been
addressed.
Fix the remaining kobject leak by restoring the common error path and
adding the missing kobject_put().
Fixes: 75f3e8e47f38 ("firmware: introduce sysfs driver for QEMU's fw_cfg device")
Cc: stable(a)vger.kernel.org # 4.6
Cc: Gabriel Somlo <somlo(a)cmu.edu>
Signed-off-by: Johan Hovold <johan(a)kernel.org>
---
drivers/firmware/qemu_fw_cfg.c | 13 ++++++-------
1 file changed, 6 insertions(+), 7 deletions(-)
diff --git a/drivers/firmware/qemu_fw_cfg.c b/drivers/firmware/qemu_fw_cfg.c
index a9c64ebfc49a..ccb7ed62452f 100644
--- a/drivers/firmware/qemu_fw_cfg.c
+++ b/drivers/firmware/qemu_fw_cfg.c
@@ -603,15 +603,13 @@ static int fw_cfg_register_file(const struct fw_cfg_file *f)
/* register entry under "/sys/firmware/qemu_fw_cfg/by_key/" */
err = kobject_init_and_add(&entry->kobj, &fw_cfg_sysfs_entry_ktype,
fw_cfg_sel_ko, "%d", entry->select);
- if (err) {
- kobject_put(&entry->kobj);
- return err;
- }
+ if (err)
+ goto err_put_entry;
/* add raw binary content access */
err = sysfs_create_bin_file(&entry->kobj, &fw_cfg_sysfs_attr_raw);
if (err)
- goto err_add_raw;
+ goto err_del_entry;
/* try adding "/sys/firmware/qemu_fw_cfg/by_name/" symlink */
fw_cfg_build_symlink(fw_cfg_fname_kset, &entry->kobj, entry->name);
@@ -620,9 +618,10 @@ static int fw_cfg_register_file(const struct fw_cfg_file *f)
fw_cfg_sysfs_cache_enlist(entry);
return 0;
-err_add_raw:
+err_del_entry:
kobject_del(&entry->kobj);
- kfree(entry);
+err_put_entry:
+ kobject_put(&entry->kobj);
return err;
}
--
2.32.0
Commit fe3c60684377 ("firmware: Fix a reference count leak.") "fixed"
a kobject leak in the file registration helper by properly calling
kobject_put() for the entry in case registration of the object fails
(e.g. due to a name collision).
This would however result in a NULL pointer dereference when the
release function tries to remove the never added entry from the
fw_cfg_entry_cache list.
Fix this by moving the list-removal out of the release function.
Note that the offending commit was one of the benign looking umn.edu
fixes which was reviewed but not reverted. [1][2]
[1] https://lore.kernel.org/r/202105051005.49BFABCE@keescook
[2] https://lore.kernel.org/all/YIg7ZOZvS3a8LjSv@kroah.com
Fixes: fe3c60684377 ("firmware: Fix a reference count leak.")
Cc: stable(a)vger.kernel.org # 5.8
Cc: Qiushi Wu <wu000273(a)umn.edu>
Cc: Kees Cook <keescook(a)chromium.org>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Signed-off-by: Johan Hovold <johan(a)kernel.org>
---
drivers/firmware/qemu_fw_cfg.c | 5 +----
1 file changed, 1 insertion(+), 4 deletions(-)
diff --git a/drivers/firmware/qemu_fw_cfg.c b/drivers/firmware/qemu_fw_cfg.c
index 172c751a4f6c..a9c64ebfc49a 100644
--- a/drivers/firmware/qemu_fw_cfg.c
+++ b/drivers/firmware/qemu_fw_cfg.c
@@ -388,9 +388,7 @@ static void fw_cfg_sysfs_cache_cleanup(void)
struct fw_cfg_sysfs_entry *entry, *next;
list_for_each_entry_safe(entry, next, &fw_cfg_entry_cache, list) {
- /* will end up invoking fw_cfg_sysfs_cache_delist()
- * via each object's release() method (i.e. destructor)
- */
+ fw_cfg_sysfs_cache_delist(entry);
kobject_put(&entry->kobj);
}
}
@@ -448,7 +446,6 @@ static void fw_cfg_sysfs_release_entry(struct kobject *kobj)
{
struct fw_cfg_sysfs_entry *entry = to_entry(kobj);
- fw_cfg_sysfs_cache_delist(entry);
kfree(entry);
}
--
2.32.0
The cec_devnode struct has a lock meant to serialize access
to the fields of this struct. This lock is taken during
device node (un)registration and when opening or releasing a
filehandle to the device node. When the last open filehandle
is closed the cec adapter might be disabled by calling the
adap_enable driver callback with the devnode.lock held.
However, if during that callback a message or event arrives
then the driver will call one of the cec_queue_event()
variants in cec-adap.c, and those will take the same devnode.lock
to walk the open filehandle list.
This obviously causes a deadlock.
This is quite easy to reproduce with the cec-gpio driver since that
uses the cec-pin framework which generated lots of events and uses
a kernel thread for the processing, so when adap_enable is called
the thread is still running and can generate events.
But I suspect that it might also happen with other drivers if an
interrupt arrives signaling e.g. a received message before adap_enable
had a chance to disable the interrupts.
This patch adds a new mutex to serialize access to the fhs list.
When adap_enable() is called the devnode.lock mutex is held, but
not devnode.lock_fhs. The event functions in cec-adap.c will now
use devnode.lock_fhs instead of devnode.lock, ensuring that it is
safe to call those functions from the adap_enable callback.
This specific issue only happens if the last open filehandle is closed
and the physical address is invalid. This is not something that
happens during normal operation, but it does happen when monitoring
CEC traffic (e.g. cec-ctl --monitor) with an unconfigured CEC adapter.
Signed-off-by: Hans Verkuil <hverkuil-cisco(a)xs4all.nl>
Cc: <stable(a)vger.kernel.org> # for v5.13 and up
---
drivers/media/cec/core/cec-adap.c | 38 +++++++++++++++++--------------
drivers/media/cec/core/cec-api.c | 6 +++++
drivers/media/cec/core/cec-core.c | 3 +++
include/media/cec.h | 11 +++++++--
4 files changed, 39 insertions(+), 19 deletions(-)
diff --git a/drivers/media/cec/core/cec-adap.c b/drivers/media/cec/core/cec-adap.c
index da73eb50cce2..857d6b1612c2 100644
--- a/drivers/media/cec/core/cec-adap.c
+++ b/drivers/media/cec/core/cec-adap.c
@@ -161,10 +161,10 @@ static void cec_queue_event(struct cec_adapter *adap,
u64 ts = ktime_get_ns();
struct cec_fh *fh;
- mutex_lock(&adap->devnode.lock);
+ mutex_lock(&adap->devnode.lock_fhs);
list_for_each_entry(fh, &adap->devnode.fhs, list)
cec_queue_event_fh(fh, ev, ts);
- mutex_unlock(&adap->devnode.lock);
+ mutex_unlock(&adap->devnode.lock_fhs);
}
/* Notify userspace that the CEC pin changed state at the given time. */
@@ -178,11 +178,12 @@ void cec_queue_pin_cec_event(struct cec_adapter *adap, bool is_high,
};
struct cec_fh *fh;
- mutex_lock(&adap->devnode.lock);
- list_for_each_entry(fh, &adap->devnode.fhs, list)
+ mutex_lock(&adap->devnode.lock_fhs);
+ list_for_each_entry(fh, &adap->devnode.fhs, list) {
if (fh->mode_follower == CEC_MODE_MONITOR_PIN)
cec_queue_event_fh(fh, &ev, ktime_to_ns(ts));
- mutex_unlock(&adap->devnode.lock);
+ }
+ mutex_unlock(&adap->devnode.lock_fhs);
}
EXPORT_SYMBOL_GPL(cec_queue_pin_cec_event);
@@ -195,10 +196,10 @@ void cec_queue_pin_hpd_event(struct cec_adapter *adap, bool is_high, ktime_t ts)
};
struct cec_fh *fh;
- mutex_lock(&adap->devnode.lock);
+ mutex_lock(&adap->devnode.lock_fhs);
list_for_each_entry(fh, &adap->devnode.fhs, list)
cec_queue_event_fh(fh, &ev, ktime_to_ns(ts));
- mutex_unlock(&adap->devnode.lock);
+ mutex_unlock(&adap->devnode.lock_fhs);
}
EXPORT_SYMBOL_GPL(cec_queue_pin_hpd_event);
@@ -211,10 +212,10 @@ void cec_queue_pin_5v_event(struct cec_adapter *adap, bool is_high, ktime_t ts)
};
struct cec_fh *fh;
- mutex_lock(&adap->devnode.lock);
+ mutex_lock(&adap->devnode.lock_fhs);
list_for_each_entry(fh, &adap->devnode.fhs, list)
cec_queue_event_fh(fh, &ev, ktime_to_ns(ts));
- mutex_unlock(&adap->devnode.lock);
+ mutex_unlock(&adap->devnode.lock_fhs);
}
EXPORT_SYMBOL_GPL(cec_queue_pin_5v_event);
@@ -286,12 +287,12 @@ static void cec_queue_msg_monitor(struct cec_adapter *adap,
u32 monitor_mode = valid_la ? CEC_MODE_MONITOR :
CEC_MODE_MONITOR_ALL;
- mutex_lock(&adap->devnode.lock);
+ mutex_lock(&adap->devnode.lock_fhs);
list_for_each_entry(fh, &adap->devnode.fhs, list) {
if (fh->mode_follower >= monitor_mode)
cec_queue_msg_fh(fh, msg);
}
- mutex_unlock(&adap->devnode.lock);
+ mutex_unlock(&adap->devnode.lock_fhs);
}
/*
@@ -302,12 +303,12 @@ static void cec_queue_msg_followers(struct cec_adapter *adap,
{
struct cec_fh *fh;
- mutex_lock(&adap->devnode.lock);
+ mutex_lock(&adap->devnode.lock_fhs);
list_for_each_entry(fh, &adap->devnode.fhs, list) {
if (fh->mode_follower == CEC_MODE_FOLLOWER)
cec_queue_msg_fh(fh, msg);
}
- mutex_unlock(&adap->devnode.lock);
+ mutex_unlock(&adap->devnode.lock_fhs);
}
/* Notify userspace of an adapter state change. */
@@ -1578,6 +1579,7 @@ void __cec_s_phys_addr(struct cec_adapter *adap, u16 phys_addr, bool block)
/* Disabling monitor all mode should always succeed */
if (adap->monitor_all_cnt)
WARN_ON(call_op(adap, adap_monitor_all_enable, false));
+ /* serialize adap_enable */
mutex_lock(&adap->devnode.lock);
if (adap->needs_hpd || list_empty(&adap->devnode.fhs)) {
WARN_ON(adap->ops->adap_enable(adap, false));
@@ -1589,14 +1591,16 @@ void __cec_s_phys_addr(struct cec_adapter *adap, u16 phys_addr, bool block)
return;
}
+ /* serialize adap_enable */
mutex_lock(&adap->devnode.lock);
adap->last_initiator = 0xff;
adap->transmit_in_progress = false;
- if ((adap->needs_hpd || list_empty(&adap->devnode.fhs)) &&
- adap->ops->adap_enable(adap, true)) {
- mutex_unlock(&adap->devnode.lock);
- return;
+ if (adap->needs_hpd || list_empty(&adap->devnode.fhs)) {
+ if (adap->ops->adap_enable(adap, true)) {
+ mutex_unlock(&adap->devnode.lock);
+ return;
+ }
}
if (adap->monitor_all_cnt &&
diff --git a/drivers/media/cec/core/cec-api.c b/drivers/media/cec/core/cec-api.c
index 0edb7142afdb..d72ad48c9898 100644
--- a/drivers/media/cec/core/cec-api.c
+++ b/drivers/media/cec/core/cec-api.c
@@ -586,6 +586,7 @@ static int cec_open(struct inode *inode, struct file *filp)
return err;
}
+ /* serialize adap_enable */
mutex_lock(&devnode->lock);
if (list_empty(&devnode->fhs) &&
!adap->needs_hpd &&
@@ -624,7 +625,9 @@ static int cec_open(struct inode *inode, struct file *filp)
}
#endif
+ mutex_lock(&devnode->lock_fhs);
list_add(&fh->list, &devnode->fhs);
+ mutex_unlock(&devnode->lock_fhs);
mutex_unlock(&devnode->lock);
return 0;
@@ -653,8 +656,11 @@ static int cec_release(struct inode *inode, struct file *filp)
cec_monitor_all_cnt_dec(adap);
mutex_unlock(&adap->lock);
+ /* serialize adap_enable */
mutex_lock(&devnode->lock);
+ mutex_lock(&devnode->lock_fhs);
list_del(&fh->list);
+ mutex_unlock(&devnode->lock_fhs);
if (cec_is_registered(adap) && list_empty(&devnode->fhs) &&
!adap->needs_hpd && adap->phys_addr == CEC_PHYS_ADDR_INVALID) {
WARN_ON(adap->ops->adap_enable(adap, false));
diff --git a/drivers/media/cec/core/cec-core.c b/drivers/media/cec/core/cec-core.c
index 551689d371a7..ec67065d5202 100644
--- a/drivers/media/cec/core/cec-core.c
+++ b/drivers/media/cec/core/cec-core.c
@@ -169,8 +169,10 @@ static void cec_devnode_unregister(struct cec_adapter *adap)
devnode->registered = false;
devnode->unregistered = true;
+ mutex_lock(&devnode->lock_fhs);
list_for_each_entry(fh, &devnode->fhs, list)
wake_up_interruptible(&fh->wait);
+ mutex_unlock(&devnode->lock_fhs);
mutex_unlock(&devnode->lock);
@@ -272,6 +274,7 @@ struct cec_adapter *cec_allocate_adapter(const struct cec_adap_ops *ops,
/* adap->devnode initialization */
INIT_LIST_HEAD(&adap->devnode.fhs);
+ mutex_init(&adap->devnode.lock_fhs);
mutex_init(&adap->devnode.lock);
adap->kthread = kthread_run(cec_thread_func, adap, "cec-%s", name);
diff --git a/include/media/cec.h b/include/media/cec.h
index 208c9613c07e..77346f757036 100644
--- a/include/media/cec.h
+++ b/include/media/cec.h
@@ -26,13 +26,17 @@
* @dev: cec device
* @cdev: cec character device
* @minor: device node minor number
+ * @lock: lock to serialize open/release and registration
* @registered: the device was correctly registered
* @unregistered: the device was unregistered
+ * @lock_fhs: lock to control access to @fhs
* @fhs: the list of open filehandles (cec_fh)
- * @lock: lock to control access to this structure
*
* This structure represents a cec-related device node.
*
+ * To add or remove filehandles from @fhs the @lock must be taken first,
+ * followed by @lock_fhs. It is safe to access @fhs if either lock is held.
+ *
* The @parent is a physical device. It must be set by core or device drivers
* before registering the node.
*/
@@ -43,10 +47,13 @@ struct cec_devnode {
/* device info */
int minor;
+ /* serialize open/release and registration */
+ struct mutex lock;
bool registered;
bool unregistered;
+ /* protect access to fhs */
+ struct mutex lock_fhs;
struct list_head fhs;
- struct mutex lock;
};
struct cec_adapter;
--
2.33.0
From: Guangming <Guangming.Cao(a)mediatek.com>
For previous version, it uses 'sg_table.nent's to traverse sg_table in pages
free flow.
However, 'sg_table.nents' is reassigned in 'dma_map_sg', it means the number of
created entries in the DMA adderess space.
So, use 'sg_table.nents' in pages free flow will case some pages can't be freed.
Here we should use sg_table.orig_nents to free pages memory, but use the
sgtable helper 'for each_sgtable_sg'(, instead of the previous rather common
helper 'for_each_sg' which maybe cause memory leak) is much better.
Fixes: d963ab0f15fb0 ("dma-buf: system_heap: Allocate higher order pages if available")
Signed-off-by: Guangming <Guangming.Cao(a)mediatek.com>
---
drivers/dma-buf/heaps/system_heap.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/dma-buf/heaps/system_heap.c b/drivers/dma-buf/heaps/system_heap.c
index 23a7e74ef966..8660508f3684 100644
--- a/drivers/dma-buf/heaps/system_heap.c
+++ b/drivers/dma-buf/heaps/system_heap.c
@@ -289,7 +289,7 @@ static void system_heap_dma_buf_release(struct dma_buf *dmabuf)
int i;
table = &buffer->sg_table;
- for_each_sg(table->sgl, sg, table->nents, i) {
+ for_each_sgtable_sg(table, sg, i) {
struct page *page = sg_page(sg);
__free_pages(page, compound_order(page));
--
2.17.1
The five patches referencing this cover letter are backports of upstream
fix "3f015d89a47c NFSv42: Fix pagecache invalidation after COPY/CLONE",
which did not apply cleanly to longterm trees.
Note: while the upstream fix received extensive testing, these backports
have been build-tested only.
Benjamin Coddington (1):
NFSv42: Fix pagecache invalidation after COPY/CLONE
fs/nfs/nfs42proc.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
--
2.31.1
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 85b6d24646e4125c591639841169baa98a2da503 Mon Sep 17 00:00:00 2001
From: Alexander Mikhalitsyn <alexander.mikhalitsyn(a)virtuozzo.com>
Date: Fri, 19 Nov 2021 16:43:21 -0800
Subject: [PATCH] shm: extend forced shm destroy to support objects from
several IPC nses
Currently, the exit_shm() function not designed to work properly when
task->sysvshm.shm_clist holds shm objects from different IPC namespaces.
This is a real pain when sysctl kernel.shm_rmid_forced = 1, because it
leads to use-after-free (reproducer exists).
This is an attempt to fix the problem by extending exit_shm mechanism to
handle shm's destroy from several IPC ns'es.
To achieve that we do several things:
1. add a namespace (non-refcounted) pointer to the struct shmid_kernel
2. during new shm object creation (newseg()/shmget syscall) we
initialize this pointer by current task IPC ns
3. exit_shm() fully reworked such that it traverses over all shp's in
task->sysvshm.shm_clist and gets IPC namespace not from current task
as it was before but from shp's object itself, then call
shm_destroy(shp, ns).
Note: We need to be really careful here, because as it was said before
(1), our pointer to IPC ns non-refcnt'ed. To be on the safe side we
using special helper get_ipc_ns_not_zero() which allows to get IPC ns
refcounter only if IPC ns not in the "state of destruction".
Q/A
Q: Why can we access shp->ns memory using non-refcounted pointer?
A: Because shp object lifetime is always shorther than IPC namespace
lifetime, so, if we get shp object from the task->sysvshm.shm_clist
while holding task_lock(task) nobody can steal our namespace.
Q: Does this patch change semantics of unshare/setns/clone syscalls?
A: No. It's just fixes non-covered case when process may leave IPC
namespace without getting task->sysvshm.shm_clist list cleaned up.
Link: https://lkml.kernel.org/r/67bb03e5-f79c-1815-e2bf-949c67047418@colorfullife…
Link: https://lkml.kernel.org/r/20211109151501.4921-1-manfred@colorfullife.com
Fixes: ab602f79915 ("shm: make exit_shm work proportional to task activity")
Co-developed-by: Manfred Spraul <manfred(a)colorfullife.com>
Signed-off-by: Manfred Spraul <manfred(a)colorfullife.com>
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn(a)virtuozzo.com>
Cc: "Eric W. Biederman" <ebiederm(a)xmission.com>
Cc: Davidlohr Bueso <dave(a)stgolabs.net>
Cc: Greg KH <gregkh(a)linuxfoundation.org>
Cc: Andrei Vagin <avagin(a)gmail.com>
Cc: Pavel Tikhomirov <ptikhomirov(a)virtuozzo.com>
Cc: Vasily Averin <vvs(a)virtuozzo.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
diff --git a/include/linux/ipc_namespace.h b/include/linux/ipc_namespace.h
index 05e22770af51..b75395ec8d52 100644
--- a/include/linux/ipc_namespace.h
+++ b/include/linux/ipc_namespace.h
@@ -131,6 +131,16 @@ static inline struct ipc_namespace *get_ipc_ns(struct ipc_namespace *ns)
return ns;
}
+static inline struct ipc_namespace *get_ipc_ns_not_zero(struct ipc_namespace *ns)
+{
+ if (ns) {
+ if (refcount_inc_not_zero(&ns->ns.count))
+ return ns;
+ }
+
+ return NULL;
+}
+
extern void put_ipc_ns(struct ipc_namespace *ns);
#else
static inline struct ipc_namespace *copy_ipcs(unsigned long flags,
@@ -147,6 +157,11 @@ static inline struct ipc_namespace *get_ipc_ns(struct ipc_namespace *ns)
return ns;
}
+static inline struct ipc_namespace *get_ipc_ns_not_zero(struct ipc_namespace *ns)
+{
+ return ns;
+}
+
static inline void put_ipc_ns(struct ipc_namespace *ns)
{
}
diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h
index ba88a6987400..058d7f371e25 100644
--- a/include/linux/sched/task.h
+++ b/include/linux/sched/task.h
@@ -158,7 +158,7 @@ static inline struct vm_struct *task_stack_vm_area(const struct task_struct *t)
* Protects ->fs, ->files, ->mm, ->group_info, ->comm, keyring
* subscriptions and synchronises with wait4(). Also used in procfs. Also
* pins the final release of task.io_context. Also protects ->cpuset and
- * ->cgroup.subsys[]. And ->vfork_done.
+ * ->cgroup.subsys[]. And ->vfork_done. And ->sysvshm.shm_clist.
*
* Nests both inside and outside of read_lock(&tasklist_lock).
* It must not be nested with write_lock_irq(&tasklist_lock),
diff --git a/ipc/shm.c b/ipc/shm.c
index 4942bdd65748..b3048ebd5c31 100644
--- a/ipc/shm.c
+++ b/ipc/shm.c
@@ -62,9 +62,18 @@ struct shmid_kernel /* private to the kernel */
struct pid *shm_lprid;
struct ucounts *mlock_ucounts;
- /* The task created the shm object. NULL if the task is dead. */
+ /*
+ * The task created the shm object, for
+ * task_lock(shp->shm_creator)
+ */
struct task_struct *shm_creator;
- struct list_head shm_clist; /* list by creator */
+
+ /*
+ * List by creator. task_lock(->shm_creator) required for read/write.
+ * If list_empty(), then the creator is dead already.
+ */
+ struct list_head shm_clist;
+ struct ipc_namespace *ns;
} __randomize_layout;
/* shm_mode upper byte flags */
@@ -115,6 +124,7 @@ static void do_shm_rmid(struct ipc_namespace *ns, struct kern_ipc_perm *ipcp)
struct shmid_kernel *shp;
shp = container_of(ipcp, struct shmid_kernel, shm_perm);
+ WARN_ON(ns != shp->ns);
if (shp->shm_nattch) {
shp->shm_perm.mode |= SHM_DEST;
@@ -225,10 +235,43 @@ static void shm_rcu_free(struct rcu_head *head)
kfree(shp);
}
-static inline void shm_rmid(struct ipc_namespace *ns, struct shmid_kernel *s)
+/*
+ * It has to be called with shp locked.
+ * It must be called before ipc_rmid()
+ */
+static inline void shm_clist_rm(struct shmid_kernel *shp)
{
- list_del(&s->shm_clist);
- ipc_rmid(&shm_ids(ns), &s->shm_perm);
+ struct task_struct *creator;
+
+ /* ensure that shm_creator does not disappear */
+ rcu_read_lock();
+
+ /*
+ * A concurrent exit_shm may do a list_del_init() as well.
+ * Just do nothing if exit_shm already did the work
+ */
+ if (!list_empty(&shp->shm_clist)) {
+ /*
+ * shp->shm_creator is guaranteed to be valid *only*
+ * if shp->shm_clist is not empty.
+ */
+ creator = shp->shm_creator;
+
+ task_lock(creator);
+ /*
+ * list_del_init() is a nop if the entry was already removed
+ * from the list.
+ */
+ list_del_init(&shp->shm_clist);
+ task_unlock(creator);
+ }
+ rcu_read_unlock();
+}
+
+static inline void shm_rmid(struct shmid_kernel *s)
+{
+ shm_clist_rm(s);
+ ipc_rmid(&shm_ids(s->ns), &s->shm_perm);
}
@@ -283,7 +326,7 @@ static void shm_destroy(struct ipc_namespace *ns, struct shmid_kernel *shp)
shm_file = shp->shm_file;
shp->shm_file = NULL;
ns->shm_tot -= (shp->shm_segsz + PAGE_SIZE - 1) >> PAGE_SHIFT;
- shm_rmid(ns, shp);
+ shm_rmid(shp);
shm_unlock(shp);
if (!is_file_hugepages(shm_file))
shmem_lock(shm_file, 0, shp->mlock_ucounts);
@@ -303,10 +346,10 @@ static void shm_destroy(struct ipc_namespace *ns, struct shmid_kernel *shp)
*
* 2) sysctl kernel.shm_rmid_forced is set to 1.
*/
-static bool shm_may_destroy(struct ipc_namespace *ns, struct shmid_kernel *shp)
+static bool shm_may_destroy(struct shmid_kernel *shp)
{
return (shp->shm_nattch == 0) &&
- (ns->shm_rmid_forced ||
+ (shp->ns->shm_rmid_forced ||
(shp->shm_perm.mode & SHM_DEST));
}
@@ -337,7 +380,7 @@ static void shm_close(struct vm_area_struct *vma)
ipc_update_pid(&shp->shm_lprid, task_tgid(current));
shp->shm_dtim = ktime_get_real_seconds();
shp->shm_nattch--;
- if (shm_may_destroy(ns, shp))
+ if (shm_may_destroy(shp))
shm_destroy(ns, shp);
else
shm_unlock(shp);
@@ -358,10 +401,10 @@ static int shm_try_destroy_orphaned(int id, void *p, void *data)
*
* As shp->* are changed under rwsem, it's safe to skip shp locking.
*/
- if (shp->shm_creator != NULL)
+ if (!list_empty(&shp->shm_clist))
return 0;
- if (shm_may_destroy(ns, shp)) {
+ if (shm_may_destroy(shp)) {
shm_lock_by_ptr(shp);
shm_destroy(ns, shp);
}
@@ -379,48 +422,97 @@ void shm_destroy_orphaned(struct ipc_namespace *ns)
/* Locking assumes this will only be called with task == current */
void exit_shm(struct task_struct *task)
{
- struct ipc_namespace *ns = task->nsproxy->ipc_ns;
- struct shmid_kernel *shp, *n;
+ for (;;) {
+ struct shmid_kernel *shp;
+ struct ipc_namespace *ns;
- if (list_empty(&task->sysvshm.shm_clist))
- return;
+ task_lock(task);
+
+ if (list_empty(&task->sysvshm.shm_clist)) {
+ task_unlock(task);
+ break;
+ }
+
+ shp = list_first_entry(&task->sysvshm.shm_clist, struct shmid_kernel,
+ shm_clist);
- /*
- * If kernel.shm_rmid_forced is not set then only keep track of
- * which shmids are orphaned, so that a later set of the sysctl
- * can clean them up.
- */
- if (!ns->shm_rmid_forced) {
- down_read(&shm_ids(ns).rwsem);
- list_for_each_entry(shp, &task->sysvshm.shm_clist, shm_clist)
- shp->shm_creator = NULL;
/*
- * Only under read lock but we are only called on current
- * so no entry on the list will be shared.
+ * 1) Get pointer to the ipc namespace. It is worth to say
+ * that this pointer is guaranteed to be valid because
+ * shp lifetime is always shorter than namespace lifetime
+ * in which shp lives.
+ * We taken task_lock it means that shp won't be freed.
*/
- list_del(&task->sysvshm.shm_clist);
- up_read(&shm_ids(ns).rwsem);
- return;
- }
+ ns = shp->ns;
- /*
- * Destroy all already created segments, that were not yet mapped,
- * and mark any mapped as orphan to cover the sysctl toggling.
- * Destroy is skipped if shm_may_destroy() returns false.
- */
- down_write(&shm_ids(ns).rwsem);
- list_for_each_entry_safe(shp, n, &task->sysvshm.shm_clist, shm_clist) {
- shp->shm_creator = NULL;
+ /*
+ * 2) If kernel.shm_rmid_forced is not set then only keep track of
+ * which shmids are orphaned, so that a later set of the sysctl
+ * can clean them up.
+ */
+ if (!ns->shm_rmid_forced)
+ goto unlink_continue;
- if (shm_may_destroy(ns, shp)) {
- shm_lock_by_ptr(shp);
- shm_destroy(ns, shp);
+ /*
+ * 3) get a reference to the namespace.
+ * The refcount could be already 0. If it is 0, then
+ * the shm objects will be free by free_ipc_work().
+ */
+ ns = get_ipc_ns_not_zero(ns);
+ if (!ns) {
+unlink_continue:
+ list_del_init(&shp->shm_clist);
+ task_unlock(task);
+ continue;
}
- }
- /* Remove the list head from any segments still attached. */
- list_del(&task->sysvshm.shm_clist);
- up_write(&shm_ids(ns).rwsem);
+ /*
+ * 4) get a reference to shp.
+ * This cannot fail: shm_clist_rm() is called before
+ * ipc_rmid(), thus the refcount cannot be 0.
+ */
+ WARN_ON(!ipc_rcu_getref(&shp->shm_perm));
+
+ /*
+ * 5) unlink the shm segment from the list of segments
+ * created by current.
+ * This must be done last. After unlinking,
+ * only the refcounts obtained above prevent IPC_RMID
+ * from destroying the segment or the namespace.
+ */
+ list_del_init(&shp->shm_clist);
+
+ task_unlock(task);
+
+ /*
+ * 6) we have all references
+ * Thus lock & if needed destroy shp.
+ */
+ down_write(&shm_ids(ns).rwsem);
+ shm_lock_by_ptr(shp);
+ /*
+ * rcu_read_lock was implicitly taken in shm_lock_by_ptr, it's
+ * safe to call ipc_rcu_putref here
+ */
+ ipc_rcu_putref(&shp->shm_perm, shm_rcu_free);
+
+ if (ipc_valid_object(&shp->shm_perm)) {
+ if (shm_may_destroy(shp))
+ shm_destroy(ns, shp);
+ else
+ shm_unlock(shp);
+ } else {
+ /*
+ * Someone else deleted the shp from namespace
+ * idr/kht while we have waited.
+ * Just unlock and continue.
+ */
+ shm_unlock(shp);
+ }
+
+ up_write(&shm_ids(ns).rwsem);
+ put_ipc_ns(ns); /* paired with get_ipc_ns_not_zero */
+ }
}
static vm_fault_t shm_fault(struct vm_fault *vmf)
@@ -676,7 +768,11 @@ static int newseg(struct ipc_namespace *ns, struct ipc_params *params)
if (error < 0)
goto no_id;
+ shp->ns = ns;
+
+ task_lock(current);
list_add(&shp->shm_clist, ¤t->sysvshm.shm_clist);
+ task_unlock(current);
/*
* shmid gets reported as "inode#" in /proc/pid/maps.
@@ -1567,7 +1663,8 @@ long do_shmat(int shmid, char __user *shmaddr, int shmflg,
down_write(&shm_ids(ns).rwsem);
shp = shm_lock(ns, shmid);
shp->shm_nattch--;
- if (shm_may_destroy(ns, shp))
+
+ if (shm_may_destroy(shp))
shm_destroy(ns, shp);
else
shm_unlock(shp);
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 126e8bee943e9926238c891e2df5b5573aee76bc Mon Sep 17 00:00:00 2001
From: Alexander Mikhalitsyn <alexander.mikhalitsyn(a)virtuozzo.com>
Date: Fri, 19 Nov 2021 16:43:18 -0800
Subject: [PATCH] ipc: WARN if trying to remove ipc object which is absent
Patch series "shm: shm_rmid_forced feature fixes".
Some time ago I met kernel crash after CRIU restore procedure,
fortunately, it was CRIU restore, so, I had dump files and could do
restore many times and crash reproduced easily. After some
investigation I've constructed the minimal reproducer. It was found
that it's use-after-free and it happens only if sysctl
kernel.shm_rmid_forced = 1.
The key of the problem is that the exit_shm() function not handles shp's
object destroy when task->sysvshm.shm_clist contains items from
different IPC namespaces. In most cases this list will contain only
items from one IPC namespace.
How can this list contain object from different namespaces? The
exit_shm() function is designed to clean up this list always when
process leaves IPC namespace. But we made a mistake a long time ago and
did not add a exit_shm() call into the setns() syscall procedures.
The first idea was just to add this call to setns() syscall but it
obviously changes semantics of setns() syscall and that's
userspace-visible change. So, I gave up on this idea.
The first real attempt to address the issue was just to omit forced
destroy if we meet shp object not from current task IPC namespace [1].
But that was not the best idea because task->sysvshm.shm_clist was
protected by rwsem which belongs to current task IPC namespace. It
means that list corruption may occur.
Second approach is just extend exit_shm() to properly handle shp's from
different IPC namespaces [2]. This is really non-trivial thing, I've
put a lot of effort into that but not believed that it's possible to
make it fully safe, clean and clear.
Thanks to the efforts of Manfred Spraul working an elegant solution was
designed. Thanks a lot, Manfred!
Eric also suggested the way to address the issue in ("[RFC][PATCH] shm:
In shm_exit destroy all created and never attached segments") Eric's
idea was to maintain a list of shm_clists one per IPC namespace, use
lock-less lists. But there is some extra memory consumption-related
concerns.
An alternative solution which was suggested by me was implemented in
("shm: reset shm_clist on setns but omit forced shm destroy"). The idea
is pretty simple, we add exit_shm() syscall to setns() but DO NOT
destroy shm segments even if sysctl kernel.shm_rmid_forced = 1, we just
clean up the task->sysvshm.shm_clist list.
This chages semantics of setns() syscall a little bit but in comparision
to the "naive" solution when we just add exit_shm() without any special
exclusions this looks like a safer option.
[1] https://lkml.org/lkml/2021/7/6/1108
[2] https://lkml.org/lkml/2021/7/14/736
This patch (of 2):
Let's produce a warning if we trying to remove non-existing IPC object
from IPC namespace kht/idr structures.
This allows us to catch possible bugs when the ipc_rmid() function was
called with inconsistent struct ipc_ids*, struct kern_ipc_perm*
arguments.
Link: https://lkml.kernel.org/r/20211027224348.611025-1-alexander.mikhalitsyn@vir…
Link: https://lkml.kernel.org/r/20211027224348.611025-2-alexander.mikhalitsyn@vir…
Co-developed-by: Manfred Spraul <manfred(a)colorfullife.com>
Signed-off-by: Manfred Spraul <manfred(a)colorfullife.com>
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn(a)virtuozzo.com>
Cc: "Eric W. Biederman" <ebiederm(a)xmission.com>
Cc: Davidlohr Bueso <dave(a)stgolabs.net>
Cc: Greg KH <gregkh(a)linuxfoundation.org>
Cc: Andrei Vagin <avagin(a)gmail.com>
Cc: Pavel Tikhomirov <ptikhomirov(a)virtuozzo.com>
Cc: Vasily Averin <vvs(a)virtuozzo.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
diff --git a/ipc/util.c b/ipc/util.c
index d48d8cfa1f3f..fa2d86ef3fb8 100644
--- a/ipc/util.c
+++ b/ipc/util.c
@@ -447,8 +447,8 @@ static int ipcget_public(struct ipc_namespace *ns, struct ipc_ids *ids,
static void ipc_kht_remove(struct ipc_ids *ids, struct kern_ipc_perm *ipcp)
{
if (ipcp->key != IPC_PRIVATE)
- rhashtable_remove_fast(&ids->key_ht, &ipcp->khtnode,
- ipc_kht_params);
+ WARN_ON_ONCE(rhashtable_remove_fast(&ids->key_ht, &ipcp->khtnode,
+ ipc_kht_params));
}
/**
@@ -498,7 +498,7 @@ void ipc_rmid(struct ipc_ids *ids, struct kern_ipc_perm *ipcp)
{
int idx = ipcid_to_idx(ipcp->id);
- idr_remove(&ids->ipcs_idr, idx);
+ WARN_ON_ONCE(idr_remove(&ids->ipcs_idr, idx) != ipcp);
ipc_kht_remove(ids, ipcp);
ids->in_use--;
ipcp->deleted = true;
Hi Greg,
it seems that many reported issues for 5.15 and 5.14 kernels are fixed
by the recent development on 5.16-rc, and it's worth to backport those
to 5.15.x stable.
Below is the list of commits from the mainline. Could you cherry-pick
them to 5.15.x stable? Applying from top to bottom.
I confirmed that they can be applied and built cleanly on 5.15.5.
4e7cf1fbb34ecb472c073980458cbe413afd4d64
9c9a3b9da891cc70405a544da6855700eddcbb71
e581f1cec4f899f788f6c9477f805b1d5fef25e2
bceee75387554f682638e719d1ea60125ea78cea
d215f63d49da9a8803af3e81acd6cad743686573
0ef74366bc150dda4f53c546dfa6e8f7c707e087
d5f871f89e21bb71827ea57bd484eedea85839a0
813a17cab9b708bbb1e0db8902e19857b57196ec
23939115be181bc5dbc33aa8471adcdbffa28910
53451b6da8271905941eb1eb369db152c4bd92f2
eee5d6f1356a016105a974fb176b491288439efa
83de8f83816e8e15227dac985163e3d433a2bf9d
The diffstat of those are like below:
sound/usb/card.h | 10 ++-
sound/usb/endpoint.c | 223 +++++++++++++++++++++++++++++++++++++--------------
sound/usb/endpoint.h | 13 ++-
sound/usb/pcm.c | 165 +++++++++++++++++++++++++++++--------
4 files changed, 306 insertions(+), 105 deletions(-)
If you prefer, I can submit the patches, too.
Thanks!
Takashi
While working on supporting the Intel HDR backlight interface, I noticed
that there's a couple of laptops that will very rarely manage to boot up
without detecting Intel HDR backlight support - even though it's supported
on the system. One example of such a laptop is the Lenovo P17 1st
generation.
Following some investigation Ville Syrjälä did through the docs they have
available to them, they discovered that there's actually supposed to be a
30ms wait after writing the source OUI before we begin setting up the rest
of the backlight interface.
This seems to be correct, as adding this 30ms delay seems to have
completely fixed the probing issues I was previously seeing. So - let's
start performing a 30ms wait after writing the OUI, which we do in a manner
similar to how we keep track of PPS delays (e.g. record the timestamp of
the OUI write, and then wait for however many ms are left since that
timestamp right before we interact with the backlight) in order to avoid
waiting any longer then we need to. As well, this also avoids us performing
this delay on systems where we don't end up using the HDR backlight
interface.
V2:
* Move panel delays into intel_pps
Signed-off-by: Lyude Paul <lyude(a)redhat.com>
Fixes: 4a8d79901d5b ("drm/i915/dp: Enable Intel's HDR backlight interface (only SDR for now)")
Cc: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
Cc: <stable(a)vger.kernel.org> # v5.12+
---
drivers/gpu/drm/i915/display/intel_display_types.h | 4 ++++
drivers/gpu/drm/i915/display/intel_dp.c | 11 +++++++++++
drivers/gpu/drm/i915/display/intel_dp.h | 2 ++
drivers/gpu/drm/i915/display/intel_dp_aux_backlight.c | 5 +++++
4 files changed, 22 insertions(+)
diff --git a/drivers/gpu/drm/i915/display/intel_display_types.h b/drivers/gpu/drm/i915/display/intel_display_types.h
index ea1e8a6e10b0..ad64f9caa7ff 100644
--- a/drivers/gpu/drm/i915/display/intel_display_types.h
+++ b/drivers/gpu/drm/i915/display/intel_display_types.h
@@ -1485,6 +1485,7 @@ struct intel_pps {
bool want_panel_vdd;
unsigned long last_power_on;
unsigned long last_backlight_off;
+ unsigned long last_oui_write;
ktime_t panel_power_off_time;
intel_wakeref_t vdd_wakeref;
@@ -1653,6 +1654,9 @@ struct intel_dp {
struct intel_dp_pcon_frl frl;
struct intel_psr psr;
+
+ /* When we last wrote the OUI for eDP */
+ unsigned long last_oui_write;
};
enum lspcon_vendor {
diff --git a/drivers/gpu/drm/i915/display/intel_dp.c b/drivers/gpu/drm/i915/display/intel_dp.c
index 0a424bf69396..45318891ba07 100644
--- a/drivers/gpu/drm/i915/display/intel_dp.c
+++ b/drivers/gpu/drm/i915/display/intel_dp.c
@@ -29,6 +29,7 @@
#include <linux/i2c.h>
#include <linux/notifier.h>
#include <linux/slab.h>
+#include <linux/timekeeping.h>
#include <linux/types.h>
#include <asm/byteorder.h>
@@ -2010,6 +2011,16 @@ intel_edp_init_source_oui(struct intel_dp *intel_dp, bool careful)
if (drm_dp_dpcd_write(&intel_dp->aux, DP_SOURCE_OUI, oui, sizeof(oui)) < 0)
drm_err(&i915->drm, "Failed to write source OUI\n");
+
+ intel_dp->pps.last_oui_write = jiffies;
+}
+
+void intel_dp_wait_source_oui(struct intel_dp *intel_dp)
+{
+ struct drm_i915_private *i915 = dp_to_i915(intel_dp);
+
+ drm_dbg_kms(&i915->drm, "Performing OUI wait\n");
+ wait_remaining_ms_from_jiffies(intel_dp->last_oui_write, 30);
}
/* If the device supports it, try to set the power state appropriately */
diff --git a/drivers/gpu/drm/i915/display/intel_dp.h b/drivers/gpu/drm/i915/display/intel_dp.h
index ce229026dc91..b64145a3869a 100644
--- a/drivers/gpu/drm/i915/display/intel_dp.h
+++ b/drivers/gpu/drm/i915/display/intel_dp.h
@@ -119,4 +119,6 @@ void intel_dp_pcon_dsc_configure(struct intel_dp *intel_dp,
const struct intel_crtc_state *crtc_state);
void intel_dp_phy_test(struct intel_encoder *encoder);
+void intel_dp_wait_source_oui(struct intel_dp *intel_dp);
+
#endif /* __INTEL_DP_H__ */
diff --git a/drivers/gpu/drm/i915/display/intel_dp_aux_backlight.c b/drivers/gpu/drm/i915/display/intel_dp_aux_backlight.c
index 8b9c925c4c16..62c112daacf2 100644
--- a/drivers/gpu/drm/i915/display/intel_dp_aux_backlight.c
+++ b/drivers/gpu/drm/i915/display/intel_dp_aux_backlight.c
@@ -36,6 +36,7 @@
#include "intel_backlight.h"
#include "intel_display_types.h"
+#include "intel_dp.h"
#include "intel_dp_aux_backlight.h"
/* TODO:
@@ -106,6 +107,8 @@ intel_dp_aux_supports_hdr_backlight(struct intel_connector *connector)
int ret;
u8 tcon_cap[4];
+ intel_dp_wait_source_oui(intel_dp);
+
ret = drm_dp_dpcd_read(aux, INTEL_EDP_HDR_TCON_CAP0, tcon_cap, sizeof(tcon_cap));
if (ret != sizeof(tcon_cap))
return false;
@@ -204,6 +207,8 @@ intel_dp_aux_hdr_enable_backlight(const struct intel_crtc_state *crtc_state,
int ret;
u8 old_ctrl, ctrl;
+ intel_dp_wait_source_oui(intel_dp);
+
ret = drm_dp_dpcd_readb(&intel_dp->aux, INTEL_EDP_HDR_GETSET_CTRL_PARAMS, &old_ctrl);
if (ret != 1) {
drm_err(&i915->drm, "Failed to read current backlight control mode: %d\n", ret);
--
2.33.1
Le 30/11/2021 à 17:26, Stephen Suryaputra a écrit :
> IPCB/IP6CB need to be initialized when processing outbound v4 or v6 pkts
> in the codepath of vrf device xmit function so that leftover garbage
> doesn't cause futher code that uses the CB to incorrectly process the
> pkt.
>
> One occasion of the issue might occur when MPLS route uses the vrf
> device as the outgoing device such as when the route is added using "ip
> -f mpls route add <label> dev <vrf>" command.
>
> The problems seems to exist since day one. Hence I put the day one
> commits on the Fixes tags.
>
> Fixes: 193125dbd8eb ("net: Introduce VRF device driver")
> Fixes: 35402e313663 ("net: Add IPv6 support to VRF device")
> Signed-off-by: Stephen Suryaputra <ssuryaextr(a)gmail.com>
Cc: stable(a)vger.kernel.org
Thanks,
Nicolas
Building sh:rts7751r2dplus_defconfig:net,rtl8139:initrd ... failed
------------
Error log:
In file included from arch/sh/mm/init.c:25:
./arch/sh/include/asm/tlb.h:118:15: error: unknown type name 'voide'
118 | static inline voide
| ^~~~~
./arch/sh/include/asm/tlb.h: In function 'tlb_flush_pmd_range':
./arch/sh/include/asm/tlb.h:126:1: error: no return statement in function returning non-void [-Werror=return-type]
Guenter
Check out this report and any autotriaged failures in our web dashboard:
https://datawarehouse.cki-project.org/kcidb/checkouts/25737
Hello,
We ran automated tests on a recent commit from this kernel tree:
Kernel repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
Commit: 862cfde7c0e2 - drm/amdgpu/gfx9: switch to golden tsc registers for renoir+
The results of these automated tests are provided below.
Overall result: PASSED
Merge: OK
Compile: OK
Tests: OK
Targeted tests: NO
All kernel binaries, config files, and logs are available for download here:
https://arr-cki-prod-datawarehouse-public.s3.amazonaws.com/index.html?prefi…
Please reply to this email if you have any questions about the tests that we
ran or if you have any suggestions on how to make future tests more effective.
,-. ,-.
( C ) ( K ) Continuous
`-',-.`-' Kernel
( I ) Integration
`-'
______________________________________________________________________________
Compile testing
---------------
We compiled the kernel for 4 architectures:
aarch64:
make options: make -j24 INSTALL_MOD_STRIP=1 targz-pkg
ppc64le:
make options: make -j24 INSTALL_MOD_STRIP=1 targz-pkg
s390x:
make options: make -j24 INSTALL_MOD_STRIP=1 targz-pkg
x86_64:
make options: make -j24 INSTALL_MOD_STRIP=1 targz-pkg
Hardware testing
----------------
We booted each kernel and ran the following tests:
aarch64:
Host 1:
✅ Boot test
✅ Reboot test
🚧 ✅ Storage blktests - nvmeof-mp
Host 2:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
✅ Boot test
✅ Reboot test
⚡⚡⚡ xfstests - ext4
⚡⚡⚡ xfstests - xfs
⚡⚡⚡ IPMI driver test
⚡⚡⚡ IPMItool loop stress test
⚡⚡⚡ selinux-policy: serge-testsuite
⚡⚡⚡ Storage blktests - blk
⚡⚡⚡ Storage block - filesystem fio test
⚡⚡⚡ Storage block - queue scheduler test
⚡⚡⚡ storage: software RAID testing
⚡⚡⚡ Storage: swraid mdadm raid_module test
⚡⚡⚡ stress: stress-ng - interrupt
⚡⚡⚡ stress: stress-ng - cpu
⚡⚡⚡ stress: stress-ng - cpu-cache
⚡⚡⚡ stress: stress-ng - memory
🚧 ⚡⚡⚡ Podman system test - as root
🚧 ⚡⚡⚡ Podman system test - as user
🚧 ⚡⚡⚡ xfstests - btrfs
🚧 ⚡⚡⚡ Storage blktests - nvme-tcp
🚧 ⚡⚡⚡ lvm cache test
🚧 ⚡⚡⚡ stress: stress-ng - os
Host 3:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ Reboot test
🚧 ⚡⚡⚡ Storage blktests - srp
Host 4:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ Reboot test
⚡⚡⚡ Networking bridge: sanity - mlx5
⚡⚡⚡ Ethernet drivers sanity - mlx5
Host 5:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
✅ Boot test
✅ Reboot test
✅ ACPI table test
✅ ACPI enabled test
⚡⚡⚡ LTP - cve
⚡⚡⚡ LTP - sched
⚡⚡⚡ LTP - syscalls
⚡⚡⚡ LTP - can
⚡⚡⚡ LTP - commands
⚡⚡⚡ LTP - containers
⚡⚡⚡ LTP - dio
⚡⚡⚡ LTP - fs
⚡⚡⚡ LTP - fsx
⚡⚡⚡ LTP - math
⚡⚡⚡ LTP - hugetlb
⚡⚡⚡ LTP - mm
⚡⚡⚡ LTP - nptl
⚡⚡⚡ LTP - pty
⚡⚡⚡ LTP - ipc
⚡⚡⚡ LTP - tracing
⚡⚡⚡ LTP: openposix test suite
⚡⚡⚡ CIFS Connectathon
⚡⚡⚡ POSIX pjd-fstest suites
⚡⚡⚡ NFS Connectathon
⚡⚡⚡ Loopdev Sanity
⚡⚡⚡ jvm - jcstress tests
⚡⚡⚡ Memory: fork_mem
⚡⚡⚡ Memory function: memfd_create
⚡⚡⚡ AMTU (Abstract Machine Test Utility)
⚡⚡⚡ Networking bridge: sanity
⚡⚡⚡ Ethernet drivers sanity
⚡⚡⚡ Networking socket: fuzz
⚡⚡⚡ Networking route: pmtu
⚡⚡⚡ Networking route_func - local
⚡⚡⚡ Networking route_func - forward
⚡⚡⚡ Networking TCP: keepalive test
⚡⚡⚡ Networking UDP: socket
⚡⚡⚡ Networking cki netfilter test
⚡⚡⚡ Networking tunnel: geneve basic test
⚡⚡⚡ Networking tunnel: gre basic
⚡⚡⚡ L2TP basic test
⚡⚡⚡ Networking tunnel: vxlan basic
⚡⚡⚡ Networking ipsec: basic netns - transport
⚡⚡⚡ Networking ipsec: basic netns - tunnel
⚡⚡⚡ Libkcapi AF_ALG test
⚡⚡⚡ pciutils: update pci ids test
⚡⚡⚡ ALSA PCM loopback test
⚡⚡⚡ ALSA Control (mixer) Userspace Element test
⚡⚡⚡ storage: dm/common
⚡⚡⚡ lvm snapper test
⚡⚡⚡ storage: SCSI VPD
⚡⚡⚡ trace: ftrace/tracer
🚧 ⚡⚡⚡ xarray-idr-radixtree-test
🚧 ⚡⚡⚡ i2c: i2cdetect sanity
🚧 ⚡⚡⚡ Firmware test suite
🚧 ⚡⚡⚡ Memory function: kaslr
🚧 ⚡⚡⚡ Networking: igmp conformance test
🚧 ⚡⚡⚡ audit: audit testsuite test
Host 6:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ Reboot test
⚡⚡⚡ Networking bridge: sanity - mlx5
⚡⚡⚡ Ethernet drivers sanity - mlx5
Host 7:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ Reboot test
🚧 ⚡⚡⚡ Storage blktests - srp
ppc64le:
Host 1:
✅ Boot test
✅ Reboot test
✅ LTP - cve
✅ LTP - sched
✅ LTP - syscalls
✅ LTP - can
✅ LTP - commands
✅ LTP - containers
✅ LTP - dio
✅ LTP - fs
✅ LTP - fsx
✅ LTP - math
✅ LTP - hugetlb
✅ LTP - mm
✅ LTP - nptl
✅ LTP - pty
✅ LTP - ipc
✅ LTP - tracing
✅ LTP: openposix test suite
✅ CIFS Connectathon
✅ POSIX pjd-fstest suites
✅ NFS Connectathon
✅ Loopdev Sanity
✅ jvm - jcstress tests
✅ Memory: fork_mem
✅ Memory function: memfd_create
✅ AMTU (Abstract Machine Test Utility)
✅ Networking bridge: sanity
✅ Ethernet drivers sanity
✅ Networking socket: fuzz
✅ Networking route: pmtu
✅ Networking route_func - local
✅ Networking route_func - forward
✅ Networking TCP: keepalive test
✅ Networking UDP: socket
✅ Networking cki netfilter test
✅ Networking tunnel: geneve basic test
✅ Networking tunnel: gre basic
✅ L2TP basic test
✅ Networking tunnel: vxlan basic
✅ Networking ipsec: basic netns - tunnel
✅ Libkcapi AF_ALG test
✅ pciutils: update pci ids test
✅ ALSA PCM loopback test
✅ ALSA Control (mixer) Userspace Element test
✅ storage: dm/common
✅ lvm snapper test
✅ trace: ftrace/tracer
🚧 ✅ xarray-idr-radixtree-test
🚧 ✅ Memory function: kaslr
🚧 ✅ audit: audit testsuite test
Host 2:
✅ Boot test
✅ Reboot test
✅ xfstests - ext4
✅ xfstests - xfs
✅ IPMI driver test
✅ IPMItool loop stress test
✅ selinux-policy: serge-testsuite
✅ Storage blktests - blk
✅ Storage block - filesystem fio test
✅ Storage block - queue scheduler test
✅ storage: software RAID testing
✅ Storage: swraid mdadm raid_module test
🚧 ✅ Podman system test - as root
🚧 ✅ Podman system test - as user
🚧 ✅ xfstests - btrfs
🚧 ✅ Storage blktests - nvme-tcp
🚧 ✅ Storage: lvm device-mapper test - upstream
🚧 ✅ lvm cache test
Host 3:
✅ Boot test
✅ Reboot test
🚧 ❌ Storage blktests - nvmeof-mp
Host 4:
✅ Boot test
✅ Reboot test
🚧 ❌ Storage blktests - srp
s390x:
Host 1:
✅ Boot test
✅ Reboot test
🚧 ✅ Storage blktests - nvmeof-mp
Host 2:
✅ Boot test
✅ Reboot test
✅ selinux-policy: serge-testsuite
✅ Storage blktests - blk
✅ Storage: swraid mdadm raid_module test
✅ stress: stress-ng - interrupt
✅ stress: stress-ng - cpu
✅ stress: stress-ng - cpu-cache
✅ stress: stress-ng - memory
🚧 ✅ Podman system test - as root
🚧 ✅ Podman system test - as user
🚧 ✅ Storage blktests - nvme-tcp
🚧 ✅ lvm cache test
🚧 ✅ stress: stress-ng - os
Host 3:
✅ Boot test
✅ Reboot test
✅ LTP - cve
✅ LTP - sched
✅ LTP - syscalls
✅ LTP - can
✅ LTP - commands
✅ LTP - containers
✅ LTP - dio
✅ LTP - fs
✅ LTP - fsx
✅ LTP - math
✅ LTP - hugetlb
✅ LTP - mm
✅ LTP - nptl
✅ LTP - pty
✅ LTP - ipc
✅ LTP - tracing
✅ LTP: openposix test suite
✅ CIFS Connectathon
✅ POSIX pjd-fstest suites
✅ NFS Connectathon
✅ Loopdev Sanity
✅ jvm - jcstress tests
✅ Memory: fork_mem
✅ Memory function: memfd_create
✅ AMTU (Abstract Machine Test Utility)
✅ Networking bridge: sanity
✅ Ethernet drivers sanity
✅ Networking route: pmtu
✅ Networking route_func - local
✅ Networking route_func - forward
✅ Networking TCP: keepalive test
✅ Networking UDP: socket
✅ Networking cki netfilter test
✅ Networking tunnel: geneve basic test
✅ Networking tunnel: gre basic
✅ L2TP basic test
✅ Networking tunnel: vxlan basic
✅ Networking ipsec: basic netns - transport
✅ Networking ipsec: basic netns - tunnel
✅ Libkcapi AF_ALG test
✅ storage: dm/common
✅ lvm snapper test
✅ trace: ftrace/tracer
🚧 ❌ xarray-idr-radixtree-test
🚧 ✅ Memory function: kaslr
🚧 ✅ audit: audit testsuite test
Host 4:
✅ Boot test
✅ Reboot test
🚧 ✅ Storage blktests - srp
x86_64:
Host 1:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
✅ Boot test
✅ Reboot test
✅ ACPI table test
⚡⚡⚡ LTP - cve
⚡⚡⚡ LTP - sched
⚡⚡⚡ LTP - syscalls
⚡⚡⚡ LTP - can
⚡⚡⚡ LTP - commands
⚡⚡⚡ LTP - containers
⚡⚡⚡ LTP - dio
⚡⚡⚡ LTP - fs
⚡⚡⚡ LTP - fsx
⚡⚡⚡ LTP - math
⚡⚡⚡ LTP - hugetlb
⚡⚡⚡ LTP - mm
⚡⚡⚡ LTP - nptl
⚡⚡⚡ LTP - pty
⚡⚡⚡ LTP - ipc
⚡⚡⚡ LTP - tracing
⚡⚡⚡ LTP: openposix test suite
⚡⚡⚡ CIFS Connectathon
⚡⚡⚡ POSIX pjd-fstest suites
⚡⚡⚡ NFS Connectathon
⚡⚡⚡ Loopdev Sanity
⚡⚡⚡ jvm - jcstress tests
⚡⚡⚡ Memory: fork_mem
⚡⚡⚡ Memory function: memfd_create
⚡⚡⚡ AMTU (Abstract Machine Test Utility)
⚡⚡⚡ Networking bridge: sanity
⚡⚡⚡ Ethernet drivers sanity
⚡⚡⚡ Networking socket: fuzz
⚡⚡⚡ Networking route: pmtu
⚡⚡⚡ Networking route_func - local
⚡⚡⚡ Networking route_func - forward
⚡⚡⚡ Networking TCP: keepalive test
⚡⚡⚡ Networking UDP: socket
⚡⚡⚡ Networking cki netfilter test
⚡⚡⚡ Networking tunnel: geneve basic test
⚡⚡⚡ Networking tunnel: gre basic
⚡⚡⚡ L2TP basic test
⚡⚡⚡ Networking tunnel: vxlan basic
⚡⚡⚡ Networking ipsec: basic netns - transport
⚡⚡⚡ Networking ipsec: basic netns - tunnel
⚡⚡⚡ Libkcapi AF_ALG test
⚡⚡⚡ pciutils: sanity smoke test
⚡⚡⚡ pciutils: update pci ids test
⚡⚡⚡ ALSA PCM loopback test
⚡⚡⚡ ALSA Control (mixer) Userspace Element test
⚡⚡⚡ storage: dm/common
⚡⚡⚡ lvm snapper test
⚡⚡⚡ storage: SCSI VPD
⚡⚡⚡ trace: ftrace/tracer
🚧 ⚡⚡⚡ xarray-idr-radixtree-test
🚧 ⚡⚡⚡ i2c: i2cdetect sanity
🚧 ⚡⚡⚡ Firmware test suite
🚧 ⚡⚡⚡ Memory function: kaslr
🚧 ⚡⚡⚡ Networking: igmp conformance test
🚧 ⚡⚡⚡ audit: audit testsuite test
Host 2:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ Reboot test
🚧 ⚡⚡⚡ Storage blktests - nvmeof-mp
Host 3:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ Reboot test
🚧 ⚡⚡⚡ Storage blktests - srp
Host 4:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ Reboot test
⚡⚡⚡ xfstests - ext4
⚡⚡⚡ xfstests - xfs
⚡⚡⚡ xfstests - nfsv4.2
⚡⚡⚡ xfstests - cifsv3.11
⚡⚡⚡ IPMI driver test
⚡⚡⚡ IPMItool loop stress test
⚡⚡⚡ selinux-policy: serge-testsuite
⚡⚡⚡ power-management: cpupower/sanity test
⚡⚡⚡ Storage blktests - blk
⚡⚡⚡ Storage block - filesystem fio test
⚡⚡⚡ Storage block - queue scheduler test
⚡⚡⚡ storage: software RAID testing
⚡⚡⚡ Storage: swraid mdadm raid_module test
⚡⚡⚡ stress: stress-ng - interrupt
⚡⚡⚡ stress: stress-ng - cpu
⚡⚡⚡ stress: stress-ng - cpu-cache
⚡⚡⚡ stress: stress-ng - memory
🚧 ⚡⚡⚡ Podman system test - as root
🚧 ⚡⚡⚡ Podman system test - as user
🚧 ⚡⚡⚡ CPU: Idle Test
🚧 ⚡⚡⚡ xfstests - btrfs
🚧 ⚡⚡⚡ Storage blktests - nvme-tcp
🚧 ⚡⚡⚡ Storage: lvm device-mapper test - upstream
🚧 ⚡⚡⚡ lvm cache test
🚧 ⚡⚡⚡ stress: stress-ng - os
Host 5:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ Reboot test
🚧 ⚡⚡⚡ Storage blktests - nvmeof-mp
Host 6:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ Reboot test
🚧 ⚡⚡⚡ Storage blktests - srp
Test sources: https://gitlab.com/cki-project/kernel-tests
💚 Pull requests are welcome for new tests or improvements to existing tests!
Aborted tests
-------------
Tests that didn't complete running successfully are marked with ⚡⚡⚡.
If this was caused by an infrastructure issue, we try to mark that
explicitly in the report.
Waived tests
------------
If the test run included waived tests, they are marked with 🚧. Such tests are
executed but their results are not taken into account. Tests are waived when
their results are not reliable enough, e.g. when they're just introduced or are
being fixed.
Testing timeout
---------------
We aim to provide a report within reasonable timeframe. Tests that haven't
finished running yet are marked with ⏱.
Targeted tests
--------------
Test runs for patches always include a set of base tests, plus some
tests chosen based on the file paths modified by the patch. The latter
are called "targeted tests". If no targeted tests are run, that means
no patch-specific tests are available. Please, consider contributing a
targeted test for related patches to increase test coverage. See
https://docs.engineering.redhat.com/x/_wEZB for more details.
Check out this report and any autotriaged failures in our web dashboard:
https://datawarehouse.cki-project.org/kcidb/checkouts/25734
Hello,
We ran automated tests on a recent commit from this kernel tree:
Kernel repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
Commit: 27a717ac84ce - drm/amdgpu/gfx9: switch to golden tsc registers for renoir+
The results of these automated tests are provided below.
Overall result: PASSED
Merge: OK
Compile: OK
Tests: OK
Targeted tests: NO
All kernel binaries, config files, and logs are available for download here:
https://arr-cki-prod-datawarehouse-public.s3.amazonaws.com/index.html?prefi…
Please reply to this email if you have any questions about the tests that we
ran or if you have any suggestions on how to make future tests more effective.
,-. ,-.
( C ) ( K ) Continuous
`-',-.`-' Kernel
( I ) Integration
`-'
______________________________________________________________________________
Compile testing
---------------
We compiled the kernel for 4 architectures:
aarch64:
make options: make -j24 INSTALL_MOD_STRIP=1 targz-pkg
ppc64le:
make options: make -j24 INSTALL_MOD_STRIP=1 targz-pkg
s390x:
make options: make -j24 INSTALL_MOD_STRIP=1 targz-pkg
x86_64:
make options: make -j24 INSTALL_MOD_STRIP=1 targz-pkg
Hardware testing
----------------
We booted each kernel and ran the following tests:
aarch64:
Host 1:
✅ Boot test
✅ Reboot test
🚧 ✅ Storage blktests - srp
Host 2:
✅ Boot test
✅ Reboot test
✅ Networking bridge: sanity - mlx5
✅ Ethernet drivers sanity - mlx5
Host 3:
✅ Boot test
✅ Reboot test
🚧 ✅ Storage blktests - nvmeof-mp
Host 4:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ Reboot test
⚡⚡⚡ xfstests - ext4
⚡⚡⚡ xfstests - xfs
⚡⚡⚡ IPMI driver test
⚡⚡⚡ IPMItool loop stress test
⚡⚡⚡ selinux-policy: serge-testsuite
⚡⚡⚡ Storage blktests - blk
⚡⚡⚡ Storage block - filesystem fio test
⚡⚡⚡ Storage block - queue scheduler test
⚡⚡⚡ storage: software RAID testing
⚡⚡⚡ Storage: swraid mdadm raid_module test
⚡⚡⚡ stress: stress-ng - interrupt
⚡⚡⚡ stress: stress-ng - cpu
⚡⚡⚡ stress: stress-ng - cpu-cache
⚡⚡⚡ stress: stress-ng - memory
🚧 ⚡⚡⚡ Podman system test - as root
🚧 ⚡⚡⚡ Podman system test - as user
🚧 ⚡⚡⚡ xfstests - btrfs
🚧 ⚡⚡⚡ Storage blktests - nvme-tcp
🚧 ⚡⚡⚡ lvm cache test
🚧 ⚡⚡⚡ stress: stress-ng - os
Host 5:
✅ Boot test
✅ Reboot test
✅ ACPI table test
✅ ACPI enabled test
✅ LTP - cve
✅ LTP - sched
✅ LTP - syscalls
✅ LTP - can
✅ LTP - commands
✅ LTP - containers
✅ LTP - dio
✅ LTP - fs
✅ LTP - fsx
✅ LTP - math
✅ LTP - hugetlb
✅ LTP - mm
✅ LTP - nptl
✅ LTP - pty
✅ LTP - ipc
✅ LTP - tracing
✅ LTP: openposix test suite
✅ CIFS Connectathon
✅ POSIX pjd-fstest suites
✅ NFS Connectathon
✅ Loopdev Sanity
✅ jvm - jcstress tests
✅ Memory: fork_mem
✅ Memory function: memfd_create
✅ AMTU (Abstract Machine Test Utility)
✅ Networking bridge: sanity
✅ Ethernet drivers sanity
✅ Networking socket: fuzz
✅ Networking route: pmtu
✅ Networking route_func - local
✅ Networking route_func - forward
✅ Networking TCP: keepalive test
✅ Networking UDP: socket
✅ Networking cki netfilter test
✅ Networking tunnel: geneve basic test
✅ Networking tunnel: gre basic
✅ L2TP basic test
✅ Networking tunnel: vxlan basic
✅ Networking ipsec: basic netns - transport
✅ Networking ipsec: basic netns - tunnel
✅ Libkcapi AF_ALG test
✅ pciutils: update pci ids test
✅ ALSA PCM loopback test
✅ ALSA Control (mixer) Userspace Element test
✅ storage: dm/common
✅ lvm snapper test
✅ storage: SCSI VPD
✅ trace: ftrace/tracer
🚧 ✅ xarray-idr-radixtree-test
🚧 ✅ i2c: i2cdetect sanity
🚧 ✅ Firmware test suite
🚧 ✅ Memory function: kaslr
🚧 ✅ Networking: igmp conformance test
🚧 ✅ audit: audit testsuite test
Host 6:
✅ Boot test
✅ Reboot test
✅ xfstests - ext4
✅ xfstests - xfs
✅ IPMI driver test
✅ IPMItool loop stress test
✅ selinux-policy: serge-testsuite
✅ Storage blktests - blk
✅ Storage block - filesystem fio test
✅ Storage block - queue scheduler test
✅ storage: software RAID testing
✅ Storage: swraid mdadm raid_module test
✅ stress: stress-ng - interrupt
✅ stress: stress-ng - cpu
✅ stress: stress-ng - cpu-cache
✅ stress: stress-ng - memory
🚧 ✅ Podman system test - as root
🚧 ❌ Podman system test - as user
🚧 ✅ xfstests - btrfs
🚧 ✅ Storage blktests - nvme-tcp
🚧 ✅ lvm cache test
🚧 ✅ stress: stress-ng - os
ppc64le:
Host 1:
✅ Boot test
✅ Reboot test
🚧 ❌ Storage blktests - srp
Host 2:
✅ Boot test
✅ Reboot test
✅ xfstests - ext4
✅ xfstests - xfs
✅ IPMI driver test
✅ IPMItool loop stress test
✅ selinux-policy: serge-testsuite
✅ Storage blktests - blk
✅ Storage block - filesystem fio test
✅ Storage block - queue scheduler test
✅ storage: software RAID testing
✅ Storage: swraid mdadm raid_module test
🚧 ✅ Podman system test - as root
🚧 ✅ Podman system test - as user
🚧 ✅ xfstests - btrfs
🚧 ✅ Storage blktests - nvme-tcp
🚧 ✅ Storage: lvm device-mapper test - upstream
🚧 ✅ lvm cache test
Host 3:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ Reboot test
🚧 ⚡⚡⚡ Storage blktests - nvmeof-mp
Host 4:
✅ Boot test
✅ Reboot test
✅ LTP - cve
✅ LTP - sched
✅ LTP - syscalls
✅ LTP - can
✅ LTP - commands
✅ LTP - containers
✅ LTP - dio
✅ LTP - fs
✅ LTP - fsx
✅ LTP - math
✅ LTP - hugetlb
✅ LTP - mm
✅ LTP - nptl
✅ LTP - pty
✅ LTP - ipc
✅ LTP - tracing
✅ LTP: openposix test suite
✅ CIFS Connectathon
✅ POSIX pjd-fstest suites
✅ NFS Connectathon
✅ Loopdev Sanity
✅ jvm - jcstress tests
✅ Memory: fork_mem
✅ Memory function: memfd_create
✅ AMTU (Abstract Machine Test Utility)
✅ Networking bridge: sanity
✅ Ethernet drivers sanity
✅ Networking socket: fuzz
✅ Networking route: pmtu
✅ Networking route_func - local
✅ Networking route_func - forward
✅ Networking TCP: keepalive test
✅ Networking UDP: socket
✅ Networking cki netfilter test
✅ Networking tunnel: geneve basic test
✅ Networking tunnel: gre basic
✅ L2TP basic test
✅ Networking tunnel: vxlan basic
✅ Networking ipsec: basic netns - tunnel
✅ Libkcapi AF_ALG test
✅ pciutils: update pci ids test
✅ ALSA PCM loopback test
✅ ALSA Control (mixer) Userspace Element test
✅ storage: dm/common
✅ lvm snapper test
✅ trace: ftrace/tracer
🚧 ✅ xarray-idr-radixtree-test
🚧 ✅ Memory function: kaslr
🚧 ✅ audit: audit testsuite test
Host 5:
✅ Boot test
✅ Reboot test
🚧 ✅ Storage blktests - nvmeof-mp
s390x:
Host 1:
✅ Boot test
✅ Reboot test
✅ LTP - cve
✅ LTP - sched
✅ LTP - syscalls
✅ LTP - can
✅ LTP - commands
✅ LTP - containers
✅ LTP - dio
✅ LTP - fs
✅ LTP - fsx
✅ LTP - math
✅ LTP - hugetlb
✅ LTP - mm
✅ LTP - nptl
✅ LTP - pty
✅ LTP - ipc
✅ LTP - tracing
✅ LTP: openposix test suite
✅ CIFS Connectathon
✅ POSIX pjd-fstest suites
✅ NFS Connectathon
✅ Loopdev Sanity
✅ jvm - jcstress tests
✅ Memory: fork_mem
✅ Memory function: memfd_create
✅ AMTU (Abstract Machine Test Utility)
✅ Networking bridge: sanity
✅ Ethernet drivers sanity
✅ Networking route: pmtu
✅ Networking route_func - local
✅ Networking route_func - forward
✅ Networking TCP: keepalive test
✅ Networking UDP: socket
✅ Networking cki netfilter test
✅ Networking tunnel: geneve basic test
✅ Networking tunnel: gre basic
✅ L2TP basic test
✅ Networking tunnel: vxlan basic
✅ Networking ipsec: basic netns - transport
✅ Networking ipsec: basic netns - tunnel
✅ Libkcapi AF_ALG test
✅ storage: dm/common
✅ lvm snapper test
✅ trace: ftrace/tracer
🚧 ❌ xarray-idr-radixtree-test
🚧 ✅ Memory function: kaslr
🚧 ✅ audit: audit testsuite test
Host 2:
✅ Boot test
✅ Reboot test
🚧 ✅ Storage blktests - nvmeof-mp
Host 3:
✅ Boot test
✅ Reboot test
✅ selinux-policy: serge-testsuite
✅ Storage blktests - blk
✅ Storage: swraid mdadm raid_module test
✅ stress: stress-ng - interrupt
✅ stress: stress-ng - cpu
✅ stress: stress-ng - cpu-cache
✅ stress: stress-ng - memory
🚧 ✅ Podman system test - as root
🚧 ✅ Podman system test - as user
🚧 ✅ Storage blktests - nvme-tcp
🚧 ✅ lvm cache test
🚧 ✅ stress: stress-ng - os
Host 4:
✅ Boot test
✅ Reboot test
🚧 ✅ Storage blktests - srp
x86_64:
Host 1:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ Reboot test
⚡⚡⚡ xfstests - ext4
⚡⚡⚡ xfstests - xfs
⚡⚡⚡ xfstests - nfsv4.2
⚡⚡⚡ xfstests - cifsv3.11
⚡⚡⚡ IPMI driver test
⚡⚡⚡ IPMItool loop stress test
⚡⚡⚡ selinux-policy: serge-testsuite
⚡⚡⚡ power-management: cpupower/sanity test
⚡⚡⚡ Storage blktests - blk
⚡⚡⚡ Storage block - filesystem fio test
⚡⚡⚡ Storage block - queue scheduler test
⚡⚡⚡ storage: software RAID testing
⚡⚡⚡ Storage: swraid mdadm raid_module test
⚡⚡⚡ stress: stress-ng - interrupt
⚡⚡⚡ stress: stress-ng - cpu
⚡⚡⚡ stress: stress-ng - cpu-cache
⚡⚡⚡ stress: stress-ng - memory
🚧 ⚡⚡⚡ Podman system test - as root
🚧 ⚡⚡⚡ Podman system test - as user
🚧 ⚡⚡⚡ CPU: Idle Test
🚧 ⚡⚡⚡ xfstests - btrfs
🚧 ⚡⚡⚡ Storage blktests - nvme-tcp
🚧 ⚡⚡⚡ Storage: lvm device-mapper test - upstream
🚧 ⚡⚡⚡ lvm cache test
🚧 ⚡⚡⚡ stress: stress-ng - os
Host 2:
✅ Boot test
✅ Reboot test
🚧 ✅ Storage blktests - srp
Host 3:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
✅ Boot test
✅ Reboot test
✅ ACPI table test
✅ LTP - cve
✅ LTP - sched
✅ LTP - syscalls
⚡⚡⚡ LTP - can
⚡⚡⚡ LTP - commands
⚡⚡⚡ LTP - containers
⚡⚡⚡ LTP - dio
⚡⚡⚡ LTP - fs
⚡⚡⚡ LTP - fsx
⚡⚡⚡ LTP - math
⚡⚡⚡ LTP - hugetlb
⚡⚡⚡ LTP - mm
⚡⚡⚡ LTP - nptl
⚡⚡⚡ LTP - pty
⚡⚡⚡ LTP - ipc
⚡⚡⚡ LTP - tracing
⚡⚡⚡ LTP: openposix test suite
⚡⚡⚡ CIFS Connectathon
⚡⚡⚡ POSIX pjd-fstest suites
⚡⚡⚡ NFS Connectathon
⚡⚡⚡ Loopdev Sanity
⚡⚡⚡ jvm - jcstress tests
⚡⚡⚡ Memory: fork_mem
⚡⚡⚡ Memory function: memfd_create
⚡⚡⚡ AMTU (Abstract Machine Test Utility)
⚡⚡⚡ Networking bridge: sanity
⚡⚡⚡ Ethernet drivers sanity
⚡⚡⚡ Networking socket: fuzz
⚡⚡⚡ Networking route: pmtu
⚡⚡⚡ Networking route_func - local
⚡⚡⚡ Networking route_func - forward
⚡⚡⚡ Networking TCP: keepalive test
⚡⚡⚡ Networking UDP: socket
⚡⚡⚡ Networking cki netfilter test
⚡⚡⚡ Networking tunnel: geneve basic test
⚡⚡⚡ Networking tunnel: gre basic
⚡⚡⚡ L2TP basic test
⚡⚡⚡ Networking tunnel: vxlan basic
⚡⚡⚡ Networking ipsec: basic netns - transport
⚡⚡⚡ Networking ipsec: basic netns - tunnel
⚡⚡⚡ Libkcapi AF_ALG test
⚡⚡⚡ pciutils: sanity smoke test
⚡⚡⚡ pciutils: update pci ids test
⚡⚡⚡ ALSA PCM loopback test
⚡⚡⚡ ALSA Control (mixer) Userspace Element test
⚡⚡⚡ storage: dm/common
⚡⚡⚡ lvm snapper test
⚡⚡⚡ storage: SCSI VPD
⚡⚡⚡ trace: ftrace/tracer
🚧 ⚡⚡⚡ xarray-idr-radixtree-test
🚧 ⚡⚡⚡ i2c: i2cdetect sanity
🚧 ⚡⚡⚡ Firmware test suite
🚧 ⚡⚡⚡ Memory function: kaslr
🚧 ⚡⚡⚡ Networking: igmp conformance test
🚧 ⚡⚡⚡ audit: audit testsuite test
Host 4:
✅ Boot test
✅ Reboot test
🚧 ✅ Storage blktests - nvmeof-mp
Host 5:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ Reboot test
⚡⚡⚡ xfstests - ext4
⚡⚡⚡ xfstests - xfs
⚡⚡⚡ xfstests - nfsv4.2
⚡⚡⚡ xfstests - cifsv3.11
⚡⚡⚡ IPMI driver test
⚡⚡⚡ IPMItool loop stress test
⚡⚡⚡ selinux-policy: serge-testsuite
⚡⚡⚡ power-management: cpupower/sanity test
⚡⚡⚡ Storage blktests - blk
⚡⚡⚡ Storage block - filesystem fio test
⚡⚡⚡ Storage block - queue scheduler test
⚡⚡⚡ storage: software RAID testing
⚡⚡⚡ Storage: swraid mdadm raid_module test
⚡⚡⚡ stress: stress-ng - interrupt
⚡⚡⚡ stress: stress-ng - cpu
⚡⚡⚡ stress: stress-ng - cpu-cache
⚡⚡⚡ stress: stress-ng - memory
🚧 ⚡⚡⚡ Podman system test - as root
🚧 ⚡⚡⚡ Podman system test - as user
🚧 ⚡⚡⚡ CPU: Idle Test
🚧 ⚡⚡⚡ xfstests - btrfs
🚧 ⚡⚡⚡ Storage blktests - nvme-tcp
🚧 ⚡⚡⚡ Storage: lvm device-mapper test - upstream
🚧 ⚡⚡⚡ lvm cache test
🚧 ⚡⚡⚡ stress: stress-ng - os
Host 6:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ Reboot test
⚡⚡⚡ xfstests - ext4
⚡⚡⚡ xfstests - xfs
⚡⚡⚡ xfstests - nfsv4.2
⚡⚡⚡ xfstests - cifsv3.11
⚡⚡⚡ IPMI driver test
⚡⚡⚡ IPMItool loop stress test
⚡⚡⚡ selinux-policy: serge-testsuite
⚡⚡⚡ power-management: cpupower/sanity test
⚡⚡⚡ Storage blktests - blk
⚡⚡⚡ Storage block - filesystem fio test
⚡⚡⚡ Storage block - queue scheduler test
⚡⚡⚡ storage: software RAID testing
⚡⚡⚡ Storage: swraid mdadm raid_module test
⚡⚡⚡ stress: stress-ng - interrupt
⚡⚡⚡ stress: stress-ng - cpu
⚡⚡⚡ stress: stress-ng - cpu-cache
⚡⚡⚡ stress: stress-ng - memory
🚧 ⚡⚡⚡ Podman system test - as root
🚧 ⚡⚡⚡ Podman system test - as user
🚧 ⚡⚡⚡ CPU: Idle Test
🚧 ⚡⚡⚡ xfstests - btrfs
🚧 ⚡⚡⚡ Storage blktests - nvme-tcp
🚧 ⚡⚡⚡ Storage: lvm device-mapper test - upstream
🚧 ⚡⚡⚡ lvm cache test
🚧 ⚡⚡⚡ stress: stress-ng - os
Test sources: https://gitlab.com/cki-project/kernel-tests
💚 Pull requests are welcome for new tests or improvements to existing tests!
Aborted tests
-------------
Tests that didn't complete running successfully are marked with ⚡⚡⚡.
If this was caused by an infrastructure issue, we try to mark that
explicitly in the report.
Waived tests
------------
If the test run included waived tests, they are marked with 🚧. Such tests are
executed but their results are not taken into account. Tests are waived when
their results are not reliable enough, e.g. when they're just introduced or are
being fixed.
Testing timeout
---------------
We aim to provide a report within reasonable timeframe. Tests that haven't
finished running yet are marked with ⏱.
Targeted tests
--------------
Test runs for patches always include a set of base tests, plus some
tests chosen based on the file paths modified by the patch. The latter
are called "targeted tests". If no targeted tests are run, that means
no patch-specific tests are available. Please, consider contributing a
targeted test for related patches to increase test coverage. See
https://docs.engineering.redhat.com/x/_wEZB for more details.
On Wed, Nov 24, 2021 at 4:54 AM Rafael J. Wysocki <rafael(a)kernel.org> wrote:
>
> On Tue, Nov 16, 2021 at 9:22 PM Evan Green <evgreen(a)chromium.org> wrote:
> >
> > On Tue, Nov 16, 2021 at 9:54 AM Rafael J. Wysocki <rafael(a)kernel.org> wrote:
> > >
> > > On Mon, Nov 15, 2021 at 6:13 PM Evan Green <evgreen(a)chromium.org> wrote:
> > > >
> > > > Gentle bump.
> > > >
> > > >
> > > > On Fri, Oct 29, 2021 at 12:24 PM Evan Green <evgreen(a)chromium.org> wrote:
> > > > >
> > > > > snapshot_write() is inappropriately limiting the amount of data that can
> > > > > be written in cases where a partial page has already been written. For
> > > > > example, one would expect to be able to write 1 byte, then 4095 bytes to
> > > > > the snapshot device, and have both of those complete fully (since now
> > > > > we're aligned to a page again). But what ends up happening is we write 1
> > > > > byte, then 4094/4095 bytes complete successfully.
> > > > >
> > > > > The reason is that simple_write_to_buffer()'s second argument is the
> > > > > total size of the buffer, not the size of the buffer minus the offset.
> > > > > Since simple_write_to_buffer() accounts for the offset in its
> > > > > implementation, snapshot_write() can just pass the full page size
> > > > > directly down.
> > > > >
> > > > > Signed-off-by: Evan Green <evgreen(a)chromium.org>
> > > > > ---
> > > > >
> > > > > kernel/power/user.c | 2 +-
> > > > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/kernel/power/user.c b/kernel/power/user.c
> > > > > index 740723bb388524..ad241b4ff64c58 100644
> > > > > --- a/kernel/power/user.c
> > > > > +++ b/kernel/power/user.c
> > > > > @@ -177,7 +177,7 @@ static ssize_t snapshot_write(struct file *filp, const char __user *buf,
> > > > > if (res <= 0)
> > > > > goto unlock;
> > > > > } else {
> > > > > - res = PAGE_SIZE - pg_offp;
> > > > > + res = PAGE_SIZE;
> > > > > }
> > > > >
> > > > > if (!data_of(data->handle)) {
> > > > > --
> > >
> > > Do you actually see this problem in practice?
> >
> > Yes. I may fire up another thread to explain why I'm stuck doing a
> > partial page write, and how I might be able to stop doing that in the
> > future with some kernel help. But either way, this is a bug.
>
> OK, patch applied as 5.16-rc material.
>
> I guess it should go into -stable kernels too?
Yes, putting it into -stable would make sense also. I should have CCed
them originally, doing that now.
-Evan
Each peer's endpoint contains a dst_cache entry that takes a reference
to another netdev. When the containing namespace exits, we take down the
socket and prevent future sockets from being created (by setting
creating_net to NULL), which removes that potential reference on the
netns. However, it doesn't release references to the netns that a netdev
cached in dst_cache might be taking, so the netns still might fail to
exit. Since the socket is gimped anyway, we can simply clear all the
dst_caches (by way of clearing the endpoint src), which will release all
references.
However, the current dst_cache_reset function only releases those
references lazily. But it turns out that all of our usages of
wg_socket_clear_peer_endpoint_src are called from contexts that are not
exactly high-speed or bottle-necked. For example, when there's
connection difficulty, or when userspace is reconfiguring the interface.
And in particular for this patch, when the netns is exiting. So for
those cases, it makes more sense to call dst_release immediately. For
that, we add a small helper function to dst_cache.
This patch also adds a test to netns.sh from Hangbin Liu to ensure this
doesn't regress.
Test-by: Hangbin Liu <liuhangbin(a)gmail.com>
Reported-by: Xiumei Mu <xmu(a)redhat.com>
Cc: Hangbin Liu <liuhangbin(a)gmail.com>
Cc: Toke Høiland-Jørgensen <toke(a)redhat.com>
Cc: Paolo Abeni <pabeni(a)redhat.com>
Fixes: 900575aa33a3 ("wireguard: device: avoid circular netns references")
Cc: stable(a)vger.kernel.org
Signed-off-by: Jason A. Donenfeld <Jason(a)zx2c4.com>
---
drivers/net/wireguard/device.c | 3 +++
drivers/net/wireguard/socket.c | 2 +-
include/net/dst_cache.h | 11 ++++++++++
net/core/dst_cache.c | 19 +++++++++++++++++
tools/testing/selftests/wireguard/netns.sh | 24 +++++++++++++++++++++-
5 files changed, 57 insertions(+), 2 deletions(-)
diff --git a/drivers/net/wireguard/device.c b/drivers/net/wireguard/device.c
index 551ddaaaf540..77e64ea6be67 100644
--- a/drivers/net/wireguard/device.c
+++ b/drivers/net/wireguard/device.c
@@ -398,6 +398,7 @@ static struct rtnl_link_ops link_ops __read_mostly = {
static void wg_netns_pre_exit(struct net *net)
{
struct wg_device *wg;
+ struct wg_peer *peer;
rtnl_lock();
list_for_each_entry(wg, &device_list, device_list) {
@@ -407,6 +408,8 @@ static void wg_netns_pre_exit(struct net *net)
mutex_lock(&wg->device_update_lock);
rcu_assign_pointer(wg->creating_net, NULL);
wg_socket_reinit(wg, NULL, NULL);
+ list_for_each_entry(peer, &wg->peer_list, peer_list)
+ wg_socket_clear_peer_endpoint_src(peer);
mutex_unlock(&wg->device_update_lock);
}
}
diff --git a/drivers/net/wireguard/socket.c b/drivers/net/wireguard/socket.c
index 8c496b747108..6f07b949cb81 100644
--- a/drivers/net/wireguard/socket.c
+++ b/drivers/net/wireguard/socket.c
@@ -308,7 +308,7 @@ void wg_socket_clear_peer_endpoint_src(struct wg_peer *peer)
{
write_lock_bh(&peer->endpoint_lock);
memset(&peer->endpoint.src6, 0, sizeof(peer->endpoint.src6));
- dst_cache_reset(&peer->endpoint_cache);
+ dst_cache_reset_now(&peer->endpoint_cache);
write_unlock_bh(&peer->endpoint_lock);
}
diff --git a/include/net/dst_cache.h b/include/net/dst_cache.h
index 67634675e919..df6622a5fe98 100644
--- a/include/net/dst_cache.h
+++ b/include/net/dst_cache.h
@@ -79,6 +79,17 @@ static inline void dst_cache_reset(struct dst_cache *dst_cache)
dst_cache->reset_ts = jiffies;
}
+/**
+ * dst_cache_reset_now - invalidate the cache contents immediately
+ * @dst_cache: the cache
+ *
+ * The caller must be sure there are no concurrent users, as this frees
+ * all dst_cache users immediately, rather than waiting for the next
+ * per-cpu usage like dst_cache_reset does. Most callers should use the
+ * higher speed lazily-freed dst_cache_reset function instead.
+ */
+void dst_cache_reset_now(struct dst_cache *dst_cache);
+
/**
* dst_cache_init - initialize the cache, allocating the required storage
* @dst_cache: the cache
diff --git a/net/core/dst_cache.c b/net/core/dst_cache.c
index be74ab4551c2..0ccfd5fa5cb9 100644
--- a/net/core/dst_cache.c
+++ b/net/core/dst_cache.c
@@ -162,3 +162,22 @@ void dst_cache_destroy(struct dst_cache *dst_cache)
free_percpu(dst_cache->cache);
}
EXPORT_SYMBOL_GPL(dst_cache_destroy);
+
+void dst_cache_reset_now(struct dst_cache *dst_cache)
+{
+ int i;
+
+ if (!dst_cache->cache)
+ return;
+
+ dst_cache->reset_ts = jiffies;
+ for_each_possible_cpu(i) {
+ struct dst_cache_pcpu *idst = per_cpu_ptr(dst_cache->cache, i);
+ struct dst_entry *dst = idst->dst;
+
+ idst->cookie = 0;
+ idst->dst = NULL;
+ dst_release(dst);
+ }
+}
+EXPORT_SYMBOL_GPL(dst_cache_reset_now);
diff --git a/tools/testing/selftests/wireguard/netns.sh b/tools/testing/selftests/wireguard/netns.sh
index 2e5c1630885e..8a9461aa0878 100755
--- a/tools/testing/selftests/wireguard/netns.sh
+++ b/tools/testing/selftests/wireguard/netns.sh
@@ -613,6 +613,28 @@ ip0 link set wg0 up
kill $ncat_pid
ip0 link del wg0
+# Ensure that dst_cache references don't outlive netns lifetime
+ip1 link add dev wg0 type wireguard
+ip2 link add dev wg0 type wireguard
+configure_peers
+ip1 link add veth1 type veth peer name veth2
+ip1 link set veth2 netns $netns2
+ip1 addr add fd00:aa::1/64 dev veth1
+ip2 addr add fd00:aa::2/64 dev veth2
+ip1 link set veth1 up
+ip2 link set veth2 up
+waitiface $netns1 veth1
+waitiface $netns2 veth2
+ip1 -6 route add default dev veth1 via fd00:aa::2
+ip2 -6 route add default dev veth2 via fd00:aa::1
+n1 wg set wg0 peer "$pub2" endpoint [fd00:aa::2]:2
+n2 wg set wg0 peer "$pub1" endpoint [fd00:aa::1]:1
+n1 ping6 -c 1 fd00::2
+pp ip netns delete $netns1
+pp ip netns delete $netns2
+pp ip netns add $netns1
+pp ip netns add $netns2
+
# Ensure there aren't circular reference loops
ip1 link add wg1 type wireguard
ip2 link add wg2 type wireguard
@@ -631,7 +653,7 @@ while read -t 0.1 -r line 2>/dev/null || [[ $? -ne 142 ]]; do
done < /dev/kmsg
alldeleted=1
for object in "${!objects[@]}"; do
- if [[ ${objects["$object"]} != *createddestroyed ]]; then
+ if [[ ${objects["$object"]} != *createddestroyed && ${objects["$object"]} != *createdcreateddestroyeddestroyed ]]; then
echo "Error: $object: merely ${objects["$object"]}" >&3
alldeleted=0
fi
--
2.34.1
Before commit 86585c61972f ("hwmon: (pwm-fan) stop using legacy
PWM functions and some cleanups") pwm_apply_state() was called
unconditionally in pwm_fan_probe(). In this commit this direct
call was replaced by a call to __set_pwm(ct, MAX_PWM) which
however is a noop if ctx->pwm_value already matches the value to
set.
After probe the fan is supposed to run at full speed, and the
internal driver state suggests it does, but this isn't asserted
and depending on bootloader and pwm low-level driver, the fan
might just be off.
So drop setting pwm_value to MAX_PWM to ensure the check in
__set_pwm doesn't make it exit early and the fan goes on as
intended.
Cc: stable(a)vger.kernel.org
Fixes: 86585c61972f ("hwmon: (pwm-fan) stop using legacy PWM functions and some cleanups")
Signed-off-by: Billy Tsai <billy_tsai(a)aspeedtech.com>
---
drivers/hwmon/pwm-fan.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/drivers/hwmon/pwm-fan.c b/drivers/hwmon/pwm-fan.c
index 17518b4cab1b..f12b9a28a232 100644
--- a/drivers/hwmon/pwm-fan.c
+++ b/drivers/hwmon/pwm-fan.c
@@ -336,8 +336,6 @@ static int pwm_fan_probe(struct platform_device *pdev)
return ret;
}
- ctx->pwm_value = MAX_PWM;
-
pwm_init_state(ctx->pwm, &ctx->pwm_state);
/*
--
2.25.1
Check out this report and any autotriaged failures in our web dashboard:
https://datawarehouse.cki-project.org/kcidb/checkouts/25602
Hello,
We ran automated tests on a recent commit from this kernel tree:
Kernel repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
Commit: b6f81f5c9c8e - cifs: nosharesock should be set on new server
The results of these automated tests are provided below.
Overall result: PASSED
Merge: OK
Compile: OK
Tests: OK
Targeted tests: NO
All kernel binaries, config files, and logs are available for download here:
https://arr-cki-prod-datawarehouse-public.s3.amazonaws.com/index.html?prefi…
Please reply to this email if you have any questions about the tests that we
ran or if you have any suggestions on how to make future tests more effective.
,-. ,-.
( C ) ( K ) Continuous
`-',-.`-' Kernel
( I ) Integration
`-'
______________________________________________________________________________
Compile testing
---------------
We compiled the kernel for 4 architectures:
aarch64:
make options: make -j24 INSTALL_MOD_STRIP=1 targz-pkg
ppc64le:
make options: make -j24 INSTALL_MOD_STRIP=1 targz-pkg
s390x:
make options: make -j24 INSTALL_MOD_STRIP=1 targz-pkg
x86_64:
make options: make -j24 INSTALL_MOD_STRIP=1 targz-pkg
Hardware testing
----------------
We booted each kernel and ran the following tests:
aarch64:
Host 1:
✅ Boot test
✅ Reboot test
🚧 ✅ Storage blktests - nvmeof-mp
Host 2:
✅ Boot test
✅ Reboot test
✅ Networking bridge: sanity - mlx5
✅ Ethernet drivers sanity - mlx5
Host 3:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ Reboot test
⚡⚡⚡ xfstests - ext4
⚡⚡⚡ xfstests - xfs
⚡⚡⚡ IPMI driver test
⚡⚡⚡ IPMItool loop stress test
⚡⚡⚡ selinux-policy: serge-testsuite
⚡⚡⚡ Storage blktests - blk
⚡⚡⚡ Storage block - filesystem fio test
⚡⚡⚡ Storage block - queue scheduler test
⚡⚡⚡ storage: software RAID testing
⚡⚡⚡ Storage: swraid mdadm raid_module test
⚡⚡⚡ stress: stress-ng - interrupt
⚡⚡⚡ stress: stress-ng - cpu
⚡⚡⚡ stress: stress-ng - cpu-cache
⚡⚡⚡ stress: stress-ng - memory
🚧 ⚡⚡⚡ Podman system test - as root
🚧 ⚡⚡⚡ Podman system test - as user
🚧 ⚡⚡⚡ xfstests - btrfs
🚧 ⚡⚡⚡ Storage blktests - nvme-tcp
🚧 ⚡⚡⚡ lvm cache test
🚧 ⚡⚡⚡ stress: stress-ng - os
Host 4:
✅ Boot test
✅ Reboot test
✅ ACPI table test
✅ ACPI enabled test
✅ LTP - cve
✅ LTP - sched
✅ LTP - syscalls
✅ LTP - can
✅ LTP - commands
✅ LTP - containers
✅ LTP - dio
✅ LTP - fs
✅ LTP - fsx
✅ LTP - math
✅ LTP - hugetlb
✅ LTP - mm
✅ LTP - nptl
✅ LTP - pty
✅ LTP - ipc
✅ LTP - tracing
✅ LTP: openposix test suite
✅ CIFS Connectathon
✅ POSIX pjd-fstest suites
✅ NFS Connectathon
✅ Loopdev Sanity
✅ jvm - jcstress tests
✅ Memory: fork_mem
✅ Memory function: memfd_create
✅ AMTU (Abstract Machine Test Utility)
✅ Networking bridge: sanity
✅ Ethernet drivers sanity
✅ Networking socket: fuzz
✅ Networking route: pmtu
✅ Networking route_func - local
✅ Networking route_func - forward
✅ Networking TCP: keepalive test
✅ Networking UDP: socket
✅ Networking cki netfilter test
✅ Networking tunnel: geneve basic test
✅ Networking tunnel: gre basic
✅ L2TP basic test
✅ Networking tunnel: vxlan basic
✅ Networking ipsec: basic netns - transport
✅ Networking ipsec: basic netns - tunnel
✅ Libkcapi AF_ALG test
✅ pciutils: update pci ids test
✅ ALSA PCM loopback test
✅ ALSA Control (mixer) Userspace Element test
✅ storage: dm/common
✅ lvm snapper test
✅ storage: SCSI VPD
✅ trace: ftrace/tracer
🚧 ✅ xarray-idr-radixtree-test
🚧 ✅ i2c: i2cdetect sanity
🚧 ✅ Firmware test suite
🚧 ✅ Memory function: kaslr
🚧 ✅ Networking: igmp conformance test
🚧 ✅ audit: audit testsuite test
Host 5:
✅ Boot test
✅ Reboot test
🚧 ✅ Storage blktests - srp
Host 6:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ Reboot test
⚡⚡⚡ xfstests - ext4
⚡⚡⚡ xfstests - xfs
⚡⚡⚡ IPMI driver test
⚡⚡⚡ IPMItool loop stress test
⚡⚡⚡ selinux-policy: serge-testsuite
⚡⚡⚡ Storage blktests - blk
⚡⚡⚡ Storage block - filesystem fio test
⚡⚡⚡ Storage block - queue scheduler test
⚡⚡⚡ storage: software RAID testing
⚡⚡⚡ Storage: swraid mdadm raid_module test
⚡⚡⚡ stress: stress-ng - interrupt
⚡⚡⚡ stress: stress-ng - cpu
⚡⚡⚡ stress: stress-ng - cpu-cache
⚡⚡⚡ stress: stress-ng - memory
🚧 ⚡⚡⚡ Podman system test - as root
🚧 ⚡⚡⚡ Podman system test - as user
🚧 ⚡⚡⚡ xfstests - btrfs
🚧 ⚡⚡⚡ Storage blktests - nvme-tcp
🚧 ⚡⚡⚡ lvm cache test
🚧 ⚡⚡⚡ stress: stress-ng - os
Host 7:
✅ Boot test
✅ Reboot test
✅ xfstests - ext4
✅ xfstests - xfs
✅ IPMI driver test
✅ IPMItool loop stress test
✅ selinux-policy: serge-testsuite
✅ Storage blktests - blk
✅ Storage block - filesystem fio test
✅ Storage block - queue scheduler test
✅ storage: software RAID testing
✅ Storage: swraid mdadm raid_module test
✅ stress: stress-ng - interrupt
✅ stress: stress-ng - cpu
✅ stress: stress-ng - cpu-cache
✅ stress: stress-ng - memory
🚧 ✅ Podman system test - as root
🚧 ❌ Podman system test - as user
🚧 ✅ xfstests - btrfs
🚧 ✅ Storage blktests - nvme-tcp
🚧 ✅ lvm cache test
🚧 ✅ stress: stress-ng - os
ppc64le:
Host 1:
✅ Boot test
✅ Reboot test
🚧 ❌ Storage blktests - srp
Host 2:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ Reboot test
⚡⚡⚡ LTP - cve
⚡⚡⚡ LTP - sched
⚡⚡⚡ LTP - syscalls
⚡⚡⚡ LTP - can
⚡⚡⚡ LTP - commands
⚡⚡⚡ LTP - containers
⚡⚡⚡ LTP - dio
⚡⚡⚡ LTP - fs
⚡⚡⚡ LTP - fsx
⚡⚡⚡ LTP - math
⚡⚡⚡ LTP - hugetlb
⚡⚡⚡ LTP - mm
⚡⚡⚡ LTP - nptl
⚡⚡⚡ LTP - pty
⚡⚡⚡ LTP - ipc
⚡⚡⚡ LTP - tracing
⚡⚡⚡ LTP: openposix test suite
⚡⚡⚡ CIFS Connectathon
⚡⚡⚡ POSIX pjd-fstest suites
⚡⚡⚡ NFS Connectathon
⚡⚡⚡ Loopdev Sanity
⚡⚡⚡ jvm - jcstress tests
⚡⚡⚡ Memory: fork_mem
⚡⚡⚡ Memory function: memfd_create
⚡⚡⚡ AMTU (Abstract Machine Test Utility)
⚡⚡⚡ Networking bridge: sanity
⚡⚡⚡ Ethernet drivers sanity
⚡⚡⚡ Networking socket: fuzz
⚡⚡⚡ Networking route: pmtu
⚡⚡⚡ Networking route_func - local
⚡⚡⚡ Networking route_func - forward
⚡⚡⚡ Networking TCP: keepalive test
⚡⚡⚡ Networking UDP: socket
⚡⚡⚡ Networking cki netfilter test
⚡⚡⚡ Networking tunnel: geneve basic test
⚡⚡⚡ Networking tunnel: gre basic
⚡⚡⚡ L2TP basic test
⚡⚡⚡ Networking tunnel: vxlan basic
⚡⚡⚡ Networking ipsec: basic netns - tunnel
⚡⚡⚡ Libkcapi AF_ALG test
⚡⚡⚡ pciutils: update pci ids test
⚡⚡⚡ ALSA PCM loopback test
⚡⚡⚡ ALSA Control (mixer) Userspace Element test
⚡⚡⚡ storage: dm/common
⚡⚡⚡ lvm snapper test
⚡⚡⚡ trace: ftrace/tracer
🚧 ⚡⚡⚡ xarray-idr-radixtree-test
🚧 ⚡⚡⚡ Memory function: kaslr
🚧 ⚡⚡⚡ audit: audit testsuite test
Host 3:
✅ Boot test
✅ Reboot test
✅ xfstests - ext4
✅ xfstests - xfs
✅ IPMI driver test
✅ IPMItool loop stress test
✅ selinux-policy: serge-testsuite
✅ Storage blktests - blk
✅ Storage block - filesystem fio test
✅ Storage block - queue scheduler test
✅ storage: software RAID testing
✅ Storage: swraid mdadm raid_module test
🚧 ✅ Podman system test - as root
🚧 ✅ Podman system test - as user
🚧 ✅ xfstests - btrfs
🚧 ✅ Storage blktests - nvme-tcp
🚧 ✅ Storage: lvm device-mapper test - upstream
🚧 ✅ lvm cache test
Host 4:
✅ Boot test
✅ Reboot test
🚧 ❌ Storage blktests - nvmeof-mp
Host 5:
✅ Boot test
✅ Reboot test
✅ LTP - cve
✅ LTP - sched
✅ LTP - syscalls
✅ LTP - can
✅ LTP - commands
✅ LTP - containers
✅ LTP - dio
✅ LTP - fs
✅ LTP - fsx
✅ LTP - math
✅ LTP - hugetlb
✅ LTP - mm
✅ LTP - nptl
✅ LTP - pty
✅ LTP - ipc
✅ LTP - tracing
✅ LTP: openposix test suite
✅ CIFS Connectathon
✅ POSIX pjd-fstest suites
✅ NFS Connectathon
✅ Loopdev Sanity
✅ jvm - jcstress tests
✅ Memory: fork_mem
✅ Memory function: memfd_create
✅ AMTU (Abstract Machine Test Utility)
✅ Networking bridge: sanity
✅ Ethernet drivers sanity
✅ Networking socket: fuzz
✅ Networking route: pmtu
✅ Networking route_func - local
✅ Networking route_func - forward
✅ Networking TCP: keepalive test
✅ Networking UDP: socket
✅ Networking cki netfilter test
✅ Networking tunnel: geneve basic test
✅ Networking tunnel: gre basic
✅ L2TP basic test
✅ Networking tunnel: vxlan basic
✅ Networking ipsec: basic netns - tunnel
✅ Libkcapi AF_ALG test
✅ pciutils: update pci ids test
✅ ALSA PCM loopback test
✅ ALSA Control (mixer) Userspace Element test
✅ storage: dm/common
✅ lvm snapper test
✅ trace: ftrace/tracer
🚧 ✅ xarray-idr-radixtree-test
🚧 ✅ Memory function: kaslr
🚧 ✅ audit: audit testsuite test
s390x:
Host 1:
✅ Boot test
✅ Reboot test
✅ LTP - cve
✅ LTP - sched
✅ LTP - syscalls
✅ LTP - can
✅ LTP - commands
✅ LTP - containers
✅ LTP - dio
✅ LTP - fs
✅ LTP - fsx
✅ LTP - math
✅ LTP - hugetlb
✅ LTP - mm
✅ LTP - nptl
✅ LTP - pty
✅ LTP - ipc
✅ LTP - tracing
✅ LTP: openposix test suite
✅ CIFS Connectathon
✅ POSIX pjd-fstest suites
✅ NFS Connectathon
✅ Loopdev Sanity
✅ jvm - jcstress tests
✅ Memory: fork_mem
✅ Memory function: memfd_create
✅ AMTU (Abstract Machine Test Utility)
✅ Networking bridge: sanity
✅ Ethernet drivers sanity
✅ Networking route: pmtu
✅ Networking route_func - local
✅ Networking route_func - forward
✅ Networking TCP: keepalive test
✅ Networking UDP: socket
✅ Networking cki netfilter test
✅ Networking tunnel: geneve basic test
✅ Networking tunnel: gre basic
✅ L2TP basic test
✅ Networking tunnel: vxlan basic
✅ Networking ipsec: basic netns - transport
✅ Networking ipsec: basic netns - tunnel
✅ Libkcapi AF_ALG test
✅ storage: dm/common
✅ lvm snapper test
✅ trace: ftrace/tracer
🚧 ❌ xarray-idr-radixtree-test
🚧 ✅ Memory function: kaslr
🚧 ✅ audit: audit testsuite test
Host 2:
✅ Boot test
✅ Reboot test
✅ selinux-policy: serge-testsuite
✅ Storage blktests - blk
✅ Storage: swraid mdadm raid_module test
✅ stress: stress-ng - interrupt
✅ stress: stress-ng - cpu
✅ stress: stress-ng - cpu-cache
✅ stress: stress-ng - memory
🚧 ✅ Podman system test - as root
🚧 ✅ Podman system test - as user
🚧 ✅ Storage blktests - nvme-tcp
🚧 ✅ lvm cache test
🚧 ✅ stress: stress-ng - os
Host 3:
✅ Boot test
✅ Reboot test
🚧 ✅ Storage blktests - srp
Host 4:
✅ Boot test
✅ Reboot test
🚧 ✅ Storage blktests - nvmeof-mp
x86_64:
Host 1:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
✅ Boot test
✅ Reboot test
✅ ACPI table test
✅ LTP - cve
✅ LTP - sched
✅ LTP - syscalls
✅ LTP - can
✅ LTP - commands
✅ LTP - containers
✅ LTP - dio
⚡⚡⚡ LTP - fs
⚡⚡⚡ LTP - fsx
⚡⚡⚡ LTP - math
⚡⚡⚡ LTP - hugetlb
⚡⚡⚡ LTP - mm
⚡⚡⚡ LTP - nptl
⚡⚡⚡ LTP - pty
⚡⚡⚡ LTP - ipc
⚡⚡⚡ LTP - tracing
⚡⚡⚡ LTP: openposix test suite
⚡⚡⚡ CIFS Connectathon
⚡⚡⚡ POSIX pjd-fstest suites
⚡⚡⚡ NFS Connectathon
⚡⚡⚡ Loopdev Sanity
⚡⚡⚡ jvm - jcstress tests
⚡⚡⚡ Memory: fork_mem
⚡⚡⚡ Memory function: memfd_create
⚡⚡⚡ AMTU (Abstract Machine Test Utility)
⚡⚡⚡ Networking bridge: sanity
⚡⚡⚡ Ethernet drivers sanity
⚡⚡⚡ Networking socket: fuzz
⚡⚡⚡ Networking route: pmtu
⚡⚡⚡ Networking route_func - local
⚡⚡⚡ Networking route_func - forward
⚡⚡⚡ Networking TCP: keepalive test
⚡⚡⚡ Networking UDP: socket
⚡⚡⚡ Networking cki netfilter test
⚡⚡⚡ Networking tunnel: geneve basic test
⚡⚡⚡ Networking tunnel: gre basic
⚡⚡⚡ L2TP basic test
⚡⚡⚡ Networking tunnel: vxlan basic
⚡⚡⚡ Networking ipsec: basic netns - transport
⚡⚡⚡ Networking ipsec: basic netns - tunnel
⚡⚡⚡ Libkcapi AF_ALG test
⚡⚡⚡ pciutils: sanity smoke test
⚡⚡⚡ pciutils: update pci ids test
⚡⚡⚡ ALSA PCM loopback test
⚡⚡⚡ ALSA Control (mixer) Userspace Element test
⚡⚡⚡ storage: dm/common
⚡⚡⚡ lvm snapper test
⚡⚡⚡ storage: SCSI VPD
⚡⚡⚡ trace: ftrace/tracer
🚧 ⚡⚡⚡ xarray-idr-radixtree-test
🚧 ⚡⚡⚡ i2c: i2cdetect sanity
🚧 ⚡⚡⚡ Firmware test suite
🚧 ⚡⚡⚡ Memory function: kaslr
🚧 ⚡⚡⚡ Networking: igmp conformance test
🚧 ⚡⚡⚡ audit: audit testsuite test
Host 2:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ Reboot test
⚡⚡⚡ xfstests - ext4
⚡⚡⚡ xfstests - xfs
⚡⚡⚡ xfstests - nfsv4.2
⚡⚡⚡ xfstests - cifsv3.11
⚡⚡⚡ IPMI driver test
⚡⚡⚡ IPMItool loop stress test
⚡⚡⚡ selinux-policy: serge-testsuite
⚡⚡⚡ power-management: cpupower/sanity test
⚡⚡⚡ Storage blktests - blk
⚡⚡⚡ Storage block - filesystem fio test
⚡⚡⚡ Storage block - queue scheduler test
⚡⚡⚡ storage: software RAID testing
⚡⚡⚡ Storage: swraid mdadm raid_module test
⚡⚡⚡ stress: stress-ng - interrupt
⚡⚡⚡ stress: stress-ng - cpu
⚡⚡⚡ stress: stress-ng - cpu-cache
⚡⚡⚡ stress: stress-ng - memory
🚧 ⚡⚡⚡ Podman system test - as root
🚧 ⚡⚡⚡ Podman system test - as user
🚧 ⚡⚡⚡ CPU: Idle Test
🚧 ⚡⚡⚡ xfstests - btrfs
🚧 ⚡⚡⚡ Storage blktests - nvme-tcp
🚧 ⚡⚡⚡ Storage: lvm device-mapper test - upstream
🚧 ⚡⚡⚡ lvm cache test
🚧 ⚡⚡⚡ stress: stress-ng - os
Host 3:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ Reboot test
🚧 ⚡⚡⚡ Storage blktests - srp
Host 4:
✅ Boot test
✅ Reboot test
🚧 ✅ Storage blktests - nvmeof-mp
Host 5:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ Reboot test
🚧 ⚡⚡⚡ Storage blktests - srp
Host 6:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ Reboot test
🚧 ⚡⚡⚡ Storage blktests - srp
Test sources: https://gitlab.com/cki-project/kernel-tests
💚 Pull requests are welcome for new tests or improvements to existing tests!
Aborted tests
-------------
Tests that didn't complete running successfully are marked with ⚡⚡⚡.
If this was caused by an infrastructure issue, we try to mark that
explicitly in the report.
Waived tests
------------
If the test run included waived tests, they are marked with 🚧. Such tests are
executed but their results are not taken into account. Tests are waived when
their results are not reliable enough, e.g. when they're just introduced or are
being fixed.
Testing timeout
---------------
We aim to provide a report within reasonable timeframe. Tests that haven't
finished running yet are marked with ⏱.
Targeted tests
--------------
Test runs for patches always include a set of base tests, plus some
tests chosen based on the file paths modified by the patch. The latter
are called "targeted tests". If no targeted tests are run, that means
no patch-specific tests are available. Please, consider contributing a
targeted test for related patches to increase test coverage. See
https://docs.engineering.redhat.com/x/_wEZB for more details.
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 85b6d24646e4125c591639841169baa98a2da503 Mon Sep 17 00:00:00 2001
From: Alexander Mikhalitsyn <alexander.mikhalitsyn(a)virtuozzo.com>
Date: Fri, 19 Nov 2021 16:43:21 -0800
Subject: [PATCH] shm: extend forced shm destroy to support objects from
several IPC nses
Currently, the exit_shm() function not designed to work properly when
task->sysvshm.shm_clist holds shm objects from different IPC namespaces.
This is a real pain when sysctl kernel.shm_rmid_forced = 1, because it
leads to use-after-free (reproducer exists).
This is an attempt to fix the problem by extending exit_shm mechanism to
handle shm's destroy from several IPC ns'es.
To achieve that we do several things:
1. add a namespace (non-refcounted) pointer to the struct shmid_kernel
2. during new shm object creation (newseg()/shmget syscall) we
initialize this pointer by current task IPC ns
3. exit_shm() fully reworked such that it traverses over all shp's in
task->sysvshm.shm_clist and gets IPC namespace not from current task
as it was before but from shp's object itself, then call
shm_destroy(shp, ns).
Note: We need to be really careful here, because as it was said before
(1), our pointer to IPC ns non-refcnt'ed. To be on the safe side we
using special helper get_ipc_ns_not_zero() which allows to get IPC ns
refcounter only if IPC ns not in the "state of destruction".
Q/A
Q: Why can we access shp->ns memory using non-refcounted pointer?
A: Because shp object lifetime is always shorther than IPC namespace
lifetime, so, if we get shp object from the task->sysvshm.shm_clist
while holding task_lock(task) nobody can steal our namespace.
Q: Does this patch change semantics of unshare/setns/clone syscalls?
A: No. It's just fixes non-covered case when process may leave IPC
namespace without getting task->sysvshm.shm_clist list cleaned up.
Link: https://lkml.kernel.org/r/67bb03e5-f79c-1815-e2bf-949c67047418@colorfullife…
Link: https://lkml.kernel.org/r/20211109151501.4921-1-manfred@colorfullife.com
Fixes: ab602f79915 ("shm: make exit_shm work proportional to task activity")
Co-developed-by: Manfred Spraul <manfred(a)colorfullife.com>
Signed-off-by: Manfred Spraul <manfred(a)colorfullife.com>
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn(a)virtuozzo.com>
Cc: "Eric W. Biederman" <ebiederm(a)xmission.com>
Cc: Davidlohr Bueso <dave(a)stgolabs.net>
Cc: Greg KH <gregkh(a)linuxfoundation.org>
Cc: Andrei Vagin <avagin(a)gmail.com>
Cc: Pavel Tikhomirov <ptikhomirov(a)virtuozzo.com>
Cc: Vasily Averin <vvs(a)virtuozzo.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
diff --git a/include/linux/ipc_namespace.h b/include/linux/ipc_namespace.h
index 05e22770af51..b75395ec8d52 100644
--- a/include/linux/ipc_namespace.h
+++ b/include/linux/ipc_namespace.h
@@ -131,6 +131,16 @@ static inline struct ipc_namespace *get_ipc_ns(struct ipc_namespace *ns)
return ns;
}
+static inline struct ipc_namespace *get_ipc_ns_not_zero(struct ipc_namespace *ns)
+{
+ if (ns) {
+ if (refcount_inc_not_zero(&ns->ns.count))
+ return ns;
+ }
+
+ return NULL;
+}
+
extern void put_ipc_ns(struct ipc_namespace *ns);
#else
static inline struct ipc_namespace *copy_ipcs(unsigned long flags,
@@ -147,6 +157,11 @@ static inline struct ipc_namespace *get_ipc_ns(struct ipc_namespace *ns)
return ns;
}
+static inline struct ipc_namespace *get_ipc_ns_not_zero(struct ipc_namespace *ns)
+{
+ return ns;
+}
+
static inline void put_ipc_ns(struct ipc_namespace *ns)
{
}
diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h
index ba88a6987400..058d7f371e25 100644
--- a/include/linux/sched/task.h
+++ b/include/linux/sched/task.h
@@ -158,7 +158,7 @@ static inline struct vm_struct *task_stack_vm_area(const struct task_struct *t)
* Protects ->fs, ->files, ->mm, ->group_info, ->comm, keyring
* subscriptions and synchronises with wait4(). Also used in procfs. Also
* pins the final release of task.io_context. Also protects ->cpuset and
- * ->cgroup.subsys[]. And ->vfork_done.
+ * ->cgroup.subsys[]. And ->vfork_done. And ->sysvshm.shm_clist.
*
* Nests both inside and outside of read_lock(&tasklist_lock).
* It must not be nested with write_lock_irq(&tasklist_lock),
diff --git a/ipc/shm.c b/ipc/shm.c
index 4942bdd65748..b3048ebd5c31 100644
--- a/ipc/shm.c
+++ b/ipc/shm.c
@@ -62,9 +62,18 @@ struct shmid_kernel /* private to the kernel */
struct pid *shm_lprid;
struct ucounts *mlock_ucounts;
- /* The task created the shm object. NULL if the task is dead. */
+ /*
+ * The task created the shm object, for
+ * task_lock(shp->shm_creator)
+ */
struct task_struct *shm_creator;
- struct list_head shm_clist; /* list by creator */
+
+ /*
+ * List by creator. task_lock(->shm_creator) required for read/write.
+ * If list_empty(), then the creator is dead already.
+ */
+ struct list_head shm_clist;
+ struct ipc_namespace *ns;
} __randomize_layout;
/* shm_mode upper byte flags */
@@ -115,6 +124,7 @@ static void do_shm_rmid(struct ipc_namespace *ns, struct kern_ipc_perm *ipcp)
struct shmid_kernel *shp;
shp = container_of(ipcp, struct shmid_kernel, shm_perm);
+ WARN_ON(ns != shp->ns);
if (shp->shm_nattch) {
shp->shm_perm.mode |= SHM_DEST;
@@ -225,10 +235,43 @@ static void shm_rcu_free(struct rcu_head *head)
kfree(shp);
}
-static inline void shm_rmid(struct ipc_namespace *ns, struct shmid_kernel *s)
+/*
+ * It has to be called with shp locked.
+ * It must be called before ipc_rmid()
+ */
+static inline void shm_clist_rm(struct shmid_kernel *shp)
{
- list_del(&s->shm_clist);
- ipc_rmid(&shm_ids(ns), &s->shm_perm);
+ struct task_struct *creator;
+
+ /* ensure that shm_creator does not disappear */
+ rcu_read_lock();
+
+ /*
+ * A concurrent exit_shm may do a list_del_init() as well.
+ * Just do nothing if exit_shm already did the work
+ */
+ if (!list_empty(&shp->shm_clist)) {
+ /*
+ * shp->shm_creator is guaranteed to be valid *only*
+ * if shp->shm_clist is not empty.
+ */
+ creator = shp->shm_creator;
+
+ task_lock(creator);
+ /*
+ * list_del_init() is a nop if the entry was already removed
+ * from the list.
+ */
+ list_del_init(&shp->shm_clist);
+ task_unlock(creator);
+ }
+ rcu_read_unlock();
+}
+
+static inline void shm_rmid(struct shmid_kernel *s)
+{
+ shm_clist_rm(s);
+ ipc_rmid(&shm_ids(s->ns), &s->shm_perm);
}
@@ -283,7 +326,7 @@ static void shm_destroy(struct ipc_namespace *ns, struct shmid_kernel *shp)
shm_file = shp->shm_file;
shp->shm_file = NULL;
ns->shm_tot -= (shp->shm_segsz + PAGE_SIZE - 1) >> PAGE_SHIFT;
- shm_rmid(ns, shp);
+ shm_rmid(shp);
shm_unlock(shp);
if (!is_file_hugepages(shm_file))
shmem_lock(shm_file, 0, shp->mlock_ucounts);
@@ -303,10 +346,10 @@ static void shm_destroy(struct ipc_namespace *ns, struct shmid_kernel *shp)
*
* 2) sysctl kernel.shm_rmid_forced is set to 1.
*/
-static bool shm_may_destroy(struct ipc_namespace *ns, struct shmid_kernel *shp)
+static bool shm_may_destroy(struct shmid_kernel *shp)
{
return (shp->shm_nattch == 0) &&
- (ns->shm_rmid_forced ||
+ (shp->ns->shm_rmid_forced ||
(shp->shm_perm.mode & SHM_DEST));
}
@@ -337,7 +380,7 @@ static void shm_close(struct vm_area_struct *vma)
ipc_update_pid(&shp->shm_lprid, task_tgid(current));
shp->shm_dtim = ktime_get_real_seconds();
shp->shm_nattch--;
- if (shm_may_destroy(ns, shp))
+ if (shm_may_destroy(shp))
shm_destroy(ns, shp);
else
shm_unlock(shp);
@@ -358,10 +401,10 @@ static int shm_try_destroy_orphaned(int id, void *p, void *data)
*
* As shp->* are changed under rwsem, it's safe to skip shp locking.
*/
- if (shp->shm_creator != NULL)
+ if (!list_empty(&shp->shm_clist))
return 0;
- if (shm_may_destroy(ns, shp)) {
+ if (shm_may_destroy(shp)) {
shm_lock_by_ptr(shp);
shm_destroy(ns, shp);
}
@@ -379,48 +422,97 @@ void shm_destroy_orphaned(struct ipc_namespace *ns)
/* Locking assumes this will only be called with task == current */
void exit_shm(struct task_struct *task)
{
- struct ipc_namespace *ns = task->nsproxy->ipc_ns;
- struct shmid_kernel *shp, *n;
+ for (;;) {
+ struct shmid_kernel *shp;
+ struct ipc_namespace *ns;
- if (list_empty(&task->sysvshm.shm_clist))
- return;
+ task_lock(task);
+
+ if (list_empty(&task->sysvshm.shm_clist)) {
+ task_unlock(task);
+ break;
+ }
+
+ shp = list_first_entry(&task->sysvshm.shm_clist, struct shmid_kernel,
+ shm_clist);
- /*
- * If kernel.shm_rmid_forced is not set then only keep track of
- * which shmids are orphaned, so that a later set of the sysctl
- * can clean them up.
- */
- if (!ns->shm_rmid_forced) {
- down_read(&shm_ids(ns).rwsem);
- list_for_each_entry(shp, &task->sysvshm.shm_clist, shm_clist)
- shp->shm_creator = NULL;
/*
- * Only under read lock but we are only called on current
- * so no entry on the list will be shared.
+ * 1) Get pointer to the ipc namespace. It is worth to say
+ * that this pointer is guaranteed to be valid because
+ * shp lifetime is always shorter than namespace lifetime
+ * in which shp lives.
+ * We taken task_lock it means that shp won't be freed.
*/
- list_del(&task->sysvshm.shm_clist);
- up_read(&shm_ids(ns).rwsem);
- return;
- }
+ ns = shp->ns;
- /*
- * Destroy all already created segments, that were not yet mapped,
- * and mark any mapped as orphan to cover the sysctl toggling.
- * Destroy is skipped if shm_may_destroy() returns false.
- */
- down_write(&shm_ids(ns).rwsem);
- list_for_each_entry_safe(shp, n, &task->sysvshm.shm_clist, shm_clist) {
- shp->shm_creator = NULL;
+ /*
+ * 2) If kernel.shm_rmid_forced is not set then only keep track of
+ * which shmids are orphaned, so that a later set of the sysctl
+ * can clean them up.
+ */
+ if (!ns->shm_rmid_forced)
+ goto unlink_continue;
- if (shm_may_destroy(ns, shp)) {
- shm_lock_by_ptr(shp);
- shm_destroy(ns, shp);
+ /*
+ * 3) get a reference to the namespace.
+ * The refcount could be already 0. If it is 0, then
+ * the shm objects will be free by free_ipc_work().
+ */
+ ns = get_ipc_ns_not_zero(ns);
+ if (!ns) {
+unlink_continue:
+ list_del_init(&shp->shm_clist);
+ task_unlock(task);
+ continue;
}
- }
- /* Remove the list head from any segments still attached. */
- list_del(&task->sysvshm.shm_clist);
- up_write(&shm_ids(ns).rwsem);
+ /*
+ * 4) get a reference to shp.
+ * This cannot fail: shm_clist_rm() is called before
+ * ipc_rmid(), thus the refcount cannot be 0.
+ */
+ WARN_ON(!ipc_rcu_getref(&shp->shm_perm));
+
+ /*
+ * 5) unlink the shm segment from the list of segments
+ * created by current.
+ * This must be done last. After unlinking,
+ * only the refcounts obtained above prevent IPC_RMID
+ * from destroying the segment or the namespace.
+ */
+ list_del_init(&shp->shm_clist);
+
+ task_unlock(task);
+
+ /*
+ * 6) we have all references
+ * Thus lock & if needed destroy shp.
+ */
+ down_write(&shm_ids(ns).rwsem);
+ shm_lock_by_ptr(shp);
+ /*
+ * rcu_read_lock was implicitly taken in shm_lock_by_ptr, it's
+ * safe to call ipc_rcu_putref here
+ */
+ ipc_rcu_putref(&shp->shm_perm, shm_rcu_free);
+
+ if (ipc_valid_object(&shp->shm_perm)) {
+ if (shm_may_destroy(shp))
+ shm_destroy(ns, shp);
+ else
+ shm_unlock(shp);
+ } else {
+ /*
+ * Someone else deleted the shp from namespace
+ * idr/kht while we have waited.
+ * Just unlock and continue.
+ */
+ shm_unlock(shp);
+ }
+
+ up_write(&shm_ids(ns).rwsem);
+ put_ipc_ns(ns); /* paired with get_ipc_ns_not_zero */
+ }
}
static vm_fault_t shm_fault(struct vm_fault *vmf)
@@ -676,7 +768,11 @@ static int newseg(struct ipc_namespace *ns, struct ipc_params *params)
if (error < 0)
goto no_id;
+ shp->ns = ns;
+
+ task_lock(current);
list_add(&shp->shm_clist, ¤t->sysvshm.shm_clist);
+ task_unlock(current);
/*
* shmid gets reported as "inode#" in /proc/pid/maps.
@@ -1567,7 +1663,8 @@ long do_shmat(int shmid, char __user *shmaddr, int shmflg,
down_write(&shm_ids(ns).rwsem);
shp = shm_lock(ns, shmid);
shp->shm_nattch--;
- if (shm_may_destroy(ns, shp))
+
+ if (shm_may_destroy(shp))
shm_destroy(ns, shp);
else
shm_unlock(shp);
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 85b6d24646e4125c591639841169baa98a2da503 Mon Sep 17 00:00:00 2001
From: Alexander Mikhalitsyn <alexander.mikhalitsyn(a)virtuozzo.com>
Date: Fri, 19 Nov 2021 16:43:21 -0800
Subject: [PATCH] shm: extend forced shm destroy to support objects from
several IPC nses
Currently, the exit_shm() function not designed to work properly when
task->sysvshm.shm_clist holds shm objects from different IPC namespaces.
This is a real pain when sysctl kernel.shm_rmid_forced = 1, because it
leads to use-after-free (reproducer exists).
This is an attempt to fix the problem by extending exit_shm mechanism to
handle shm's destroy from several IPC ns'es.
To achieve that we do several things:
1. add a namespace (non-refcounted) pointer to the struct shmid_kernel
2. during new shm object creation (newseg()/shmget syscall) we
initialize this pointer by current task IPC ns
3. exit_shm() fully reworked such that it traverses over all shp's in
task->sysvshm.shm_clist and gets IPC namespace not from current task
as it was before but from shp's object itself, then call
shm_destroy(shp, ns).
Note: We need to be really careful here, because as it was said before
(1), our pointer to IPC ns non-refcnt'ed. To be on the safe side we
using special helper get_ipc_ns_not_zero() which allows to get IPC ns
refcounter only if IPC ns not in the "state of destruction".
Q/A
Q: Why can we access shp->ns memory using non-refcounted pointer?
A: Because shp object lifetime is always shorther than IPC namespace
lifetime, so, if we get shp object from the task->sysvshm.shm_clist
while holding task_lock(task) nobody can steal our namespace.
Q: Does this patch change semantics of unshare/setns/clone syscalls?
A: No. It's just fixes non-covered case when process may leave IPC
namespace without getting task->sysvshm.shm_clist list cleaned up.
Link: https://lkml.kernel.org/r/67bb03e5-f79c-1815-e2bf-949c67047418@colorfullife…
Link: https://lkml.kernel.org/r/20211109151501.4921-1-manfred@colorfullife.com
Fixes: ab602f79915 ("shm: make exit_shm work proportional to task activity")
Co-developed-by: Manfred Spraul <manfred(a)colorfullife.com>
Signed-off-by: Manfred Spraul <manfred(a)colorfullife.com>
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn(a)virtuozzo.com>
Cc: "Eric W. Biederman" <ebiederm(a)xmission.com>
Cc: Davidlohr Bueso <dave(a)stgolabs.net>
Cc: Greg KH <gregkh(a)linuxfoundation.org>
Cc: Andrei Vagin <avagin(a)gmail.com>
Cc: Pavel Tikhomirov <ptikhomirov(a)virtuozzo.com>
Cc: Vasily Averin <vvs(a)virtuozzo.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
diff --git a/include/linux/ipc_namespace.h b/include/linux/ipc_namespace.h
index 05e22770af51..b75395ec8d52 100644
--- a/include/linux/ipc_namespace.h
+++ b/include/linux/ipc_namespace.h
@@ -131,6 +131,16 @@ static inline struct ipc_namespace *get_ipc_ns(struct ipc_namespace *ns)
return ns;
}
+static inline struct ipc_namespace *get_ipc_ns_not_zero(struct ipc_namespace *ns)
+{
+ if (ns) {
+ if (refcount_inc_not_zero(&ns->ns.count))
+ return ns;
+ }
+
+ return NULL;
+}
+
extern void put_ipc_ns(struct ipc_namespace *ns);
#else
static inline struct ipc_namespace *copy_ipcs(unsigned long flags,
@@ -147,6 +157,11 @@ static inline struct ipc_namespace *get_ipc_ns(struct ipc_namespace *ns)
return ns;
}
+static inline struct ipc_namespace *get_ipc_ns_not_zero(struct ipc_namespace *ns)
+{
+ return ns;
+}
+
static inline void put_ipc_ns(struct ipc_namespace *ns)
{
}
diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h
index ba88a6987400..058d7f371e25 100644
--- a/include/linux/sched/task.h
+++ b/include/linux/sched/task.h
@@ -158,7 +158,7 @@ static inline struct vm_struct *task_stack_vm_area(const struct task_struct *t)
* Protects ->fs, ->files, ->mm, ->group_info, ->comm, keyring
* subscriptions and synchronises with wait4(). Also used in procfs. Also
* pins the final release of task.io_context. Also protects ->cpuset and
- * ->cgroup.subsys[]. And ->vfork_done.
+ * ->cgroup.subsys[]. And ->vfork_done. And ->sysvshm.shm_clist.
*
* Nests both inside and outside of read_lock(&tasklist_lock).
* It must not be nested with write_lock_irq(&tasklist_lock),
diff --git a/ipc/shm.c b/ipc/shm.c
index 4942bdd65748..b3048ebd5c31 100644
--- a/ipc/shm.c
+++ b/ipc/shm.c
@@ -62,9 +62,18 @@ struct shmid_kernel /* private to the kernel */
struct pid *shm_lprid;
struct ucounts *mlock_ucounts;
- /* The task created the shm object. NULL if the task is dead. */
+ /*
+ * The task created the shm object, for
+ * task_lock(shp->shm_creator)
+ */
struct task_struct *shm_creator;
- struct list_head shm_clist; /* list by creator */
+
+ /*
+ * List by creator. task_lock(->shm_creator) required for read/write.
+ * If list_empty(), then the creator is dead already.
+ */
+ struct list_head shm_clist;
+ struct ipc_namespace *ns;
} __randomize_layout;
/* shm_mode upper byte flags */
@@ -115,6 +124,7 @@ static void do_shm_rmid(struct ipc_namespace *ns, struct kern_ipc_perm *ipcp)
struct shmid_kernel *shp;
shp = container_of(ipcp, struct shmid_kernel, shm_perm);
+ WARN_ON(ns != shp->ns);
if (shp->shm_nattch) {
shp->shm_perm.mode |= SHM_DEST;
@@ -225,10 +235,43 @@ static void shm_rcu_free(struct rcu_head *head)
kfree(shp);
}
-static inline void shm_rmid(struct ipc_namespace *ns, struct shmid_kernel *s)
+/*
+ * It has to be called with shp locked.
+ * It must be called before ipc_rmid()
+ */
+static inline void shm_clist_rm(struct shmid_kernel *shp)
{
- list_del(&s->shm_clist);
- ipc_rmid(&shm_ids(ns), &s->shm_perm);
+ struct task_struct *creator;
+
+ /* ensure that shm_creator does not disappear */
+ rcu_read_lock();
+
+ /*
+ * A concurrent exit_shm may do a list_del_init() as well.
+ * Just do nothing if exit_shm already did the work
+ */
+ if (!list_empty(&shp->shm_clist)) {
+ /*
+ * shp->shm_creator is guaranteed to be valid *only*
+ * if shp->shm_clist is not empty.
+ */
+ creator = shp->shm_creator;
+
+ task_lock(creator);
+ /*
+ * list_del_init() is a nop if the entry was already removed
+ * from the list.
+ */
+ list_del_init(&shp->shm_clist);
+ task_unlock(creator);
+ }
+ rcu_read_unlock();
+}
+
+static inline void shm_rmid(struct shmid_kernel *s)
+{
+ shm_clist_rm(s);
+ ipc_rmid(&shm_ids(s->ns), &s->shm_perm);
}
@@ -283,7 +326,7 @@ static void shm_destroy(struct ipc_namespace *ns, struct shmid_kernel *shp)
shm_file = shp->shm_file;
shp->shm_file = NULL;
ns->shm_tot -= (shp->shm_segsz + PAGE_SIZE - 1) >> PAGE_SHIFT;
- shm_rmid(ns, shp);
+ shm_rmid(shp);
shm_unlock(shp);
if (!is_file_hugepages(shm_file))
shmem_lock(shm_file, 0, shp->mlock_ucounts);
@@ -303,10 +346,10 @@ static void shm_destroy(struct ipc_namespace *ns, struct shmid_kernel *shp)
*
* 2) sysctl kernel.shm_rmid_forced is set to 1.
*/
-static bool shm_may_destroy(struct ipc_namespace *ns, struct shmid_kernel *shp)
+static bool shm_may_destroy(struct shmid_kernel *shp)
{
return (shp->shm_nattch == 0) &&
- (ns->shm_rmid_forced ||
+ (shp->ns->shm_rmid_forced ||
(shp->shm_perm.mode & SHM_DEST));
}
@@ -337,7 +380,7 @@ static void shm_close(struct vm_area_struct *vma)
ipc_update_pid(&shp->shm_lprid, task_tgid(current));
shp->shm_dtim = ktime_get_real_seconds();
shp->shm_nattch--;
- if (shm_may_destroy(ns, shp))
+ if (shm_may_destroy(shp))
shm_destroy(ns, shp);
else
shm_unlock(shp);
@@ -358,10 +401,10 @@ static int shm_try_destroy_orphaned(int id, void *p, void *data)
*
* As shp->* are changed under rwsem, it's safe to skip shp locking.
*/
- if (shp->shm_creator != NULL)
+ if (!list_empty(&shp->shm_clist))
return 0;
- if (shm_may_destroy(ns, shp)) {
+ if (shm_may_destroy(shp)) {
shm_lock_by_ptr(shp);
shm_destroy(ns, shp);
}
@@ -379,48 +422,97 @@ void shm_destroy_orphaned(struct ipc_namespace *ns)
/* Locking assumes this will only be called with task == current */
void exit_shm(struct task_struct *task)
{
- struct ipc_namespace *ns = task->nsproxy->ipc_ns;
- struct shmid_kernel *shp, *n;
+ for (;;) {
+ struct shmid_kernel *shp;
+ struct ipc_namespace *ns;
- if (list_empty(&task->sysvshm.shm_clist))
- return;
+ task_lock(task);
+
+ if (list_empty(&task->sysvshm.shm_clist)) {
+ task_unlock(task);
+ break;
+ }
+
+ shp = list_first_entry(&task->sysvshm.shm_clist, struct shmid_kernel,
+ shm_clist);
- /*
- * If kernel.shm_rmid_forced is not set then only keep track of
- * which shmids are orphaned, so that a later set of the sysctl
- * can clean them up.
- */
- if (!ns->shm_rmid_forced) {
- down_read(&shm_ids(ns).rwsem);
- list_for_each_entry(shp, &task->sysvshm.shm_clist, shm_clist)
- shp->shm_creator = NULL;
/*
- * Only under read lock but we are only called on current
- * so no entry on the list will be shared.
+ * 1) Get pointer to the ipc namespace. It is worth to say
+ * that this pointer is guaranteed to be valid because
+ * shp lifetime is always shorter than namespace lifetime
+ * in which shp lives.
+ * We taken task_lock it means that shp won't be freed.
*/
- list_del(&task->sysvshm.shm_clist);
- up_read(&shm_ids(ns).rwsem);
- return;
- }
+ ns = shp->ns;
- /*
- * Destroy all already created segments, that were not yet mapped,
- * and mark any mapped as orphan to cover the sysctl toggling.
- * Destroy is skipped if shm_may_destroy() returns false.
- */
- down_write(&shm_ids(ns).rwsem);
- list_for_each_entry_safe(shp, n, &task->sysvshm.shm_clist, shm_clist) {
- shp->shm_creator = NULL;
+ /*
+ * 2) If kernel.shm_rmid_forced is not set then only keep track of
+ * which shmids are orphaned, so that a later set of the sysctl
+ * can clean them up.
+ */
+ if (!ns->shm_rmid_forced)
+ goto unlink_continue;
- if (shm_may_destroy(ns, shp)) {
- shm_lock_by_ptr(shp);
- shm_destroy(ns, shp);
+ /*
+ * 3) get a reference to the namespace.
+ * The refcount could be already 0. If it is 0, then
+ * the shm objects will be free by free_ipc_work().
+ */
+ ns = get_ipc_ns_not_zero(ns);
+ if (!ns) {
+unlink_continue:
+ list_del_init(&shp->shm_clist);
+ task_unlock(task);
+ continue;
}
- }
- /* Remove the list head from any segments still attached. */
- list_del(&task->sysvshm.shm_clist);
- up_write(&shm_ids(ns).rwsem);
+ /*
+ * 4) get a reference to shp.
+ * This cannot fail: shm_clist_rm() is called before
+ * ipc_rmid(), thus the refcount cannot be 0.
+ */
+ WARN_ON(!ipc_rcu_getref(&shp->shm_perm));
+
+ /*
+ * 5) unlink the shm segment from the list of segments
+ * created by current.
+ * This must be done last. After unlinking,
+ * only the refcounts obtained above prevent IPC_RMID
+ * from destroying the segment or the namespace.
+ */
+ list_del_init(&shp->shm_clist);
+
+ task_unlock(task);
+
+ /*
+ * 6) we have all references
+ * Thus lock & if needed destroy shp.
+ */
+ down_write(&shm_ids(ns).rwsem);
+ shm_lock_by_ptr(shp);
+ /*
+ * rcu_read_lock was implicitly taken in shm_lock_by_ptr, it's
+ * safe to call ipc_rcu_putref here
+ */
+ ipc_rcu_putref(&shp->shm_perm, shm_rcu_free);
+
+ if (ipc_valid_object(&shp->shm_perm)) {
+ if (shm_may_destroy(shp))
+ shm_destroy(ns, shp);
+ else
+ shm_unlock(shp);
+ } else {
+ /*
+ * Someone else deleted the shp from namespace
+ * idr/kht while we have waited.
+ * Just unlock and continue.
+ */
+ shm_unlock(shp);
+ }
+
+ up_write(&shm_ids(ns).rwsem);
+ put_ipc_ns(ns); /* paired with get_ipc_ns_not_zero */
+ }
}
static vm_fault_t shm_fault(struct vm_fault *vmf)
@@ -676,7 +768,11 @@ static int newseg(struct ipc_namespace *ns, struct ipc_params *params)
if (error < 0)
goto no_id;
+ shp->ns = ns;
+
+ task_lock(current);
list_add(&shp->shm_clist, ¤t->sysvshm.shm_clist);
+ task_unlock(current);
/*
* shmid gets reported as "inode#" in /proc/pid/maps.
@@ -1567,7 +1663,8 @@ long do_shmat(int shmid, char __user *shmaddr, int shmflg,
down_write(&shm_ids(ns).rwsem);
shp = shm_lock(ns, shmid);
shp->shm_nattch--;
- if (shm_may_destroy(ns, shp))
+
+ if (shm_may_destroy(shp))
shm_destroy(ns, shp);
else
shm_unlock(shp);
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 85b6d24646e4125c591639841169baa98a2da503 Mon Sep 17 00:00:00 2001
From: Alexander Mikhalitsyn <alexander.mikhalitsyn(a)virtuozzo.com>
Date: Fri, 19 Nov 2021 16:43:21 -0800
Subject: [PATCH] shm: extend forced shm destroy to support objects from
several IPC nses
Currently, the exit_shm() function not designed to work properly when
task->sysvshm.shm_clist holds shm objects from different IPC namespaces.
This is a real pain when sysctl kernel.shm_rmid_forced = 1, because it
leads to use-after-free (reproducer exists).
This is an attempt to fix the problem by extending exit_shm mechanism to
handle shm's destroy from several IPC ns'es.
To achieve that we do several things:
1. add a namespace (non-refcounted) pointer to the struct shmid_kernel
2. during new shm object creation (newseg()/shmget syscall) we
initialize this pointer by current task IPC ns
3. exit_shm() fully reworked such that it traverses over all shp's in
task->sysvshm.shm_clist and gets IPC namespace not from current task
as it was before but from shp's object itself, then call
shm_destroy(shp, ns).
Note: We need to be really careful here, because as it was said before
(1), our pointer to IPC ns non-refcnt'ed. To be on the safe side we
using special helper get_ipc_ns_not_zero() which allows to get IPC ns
refcounter only if IPC ns not in the "state of destruction".
Q/A
Q: Why can we access shp->ns memory using non-refcounted pointer?
A: Because shp object lifetime is always shorther than IPC namespace
lifetime, so, if we get shp object from the task->sysvshm.shm_clist
while holding task_lock(task) nobody can steal our namespace.
Q: Does this patch change semantics of unshare/setns/clone syscalls?
A: No. It's just fixes non-covered case when process may leave IPC
namespace without getting task->sysvshm.shm_clist list cleaned up.
Link: https://lkml.kernel.org/r/67bb03e5-f79c-1815-e2bf-949c67047418@colorfullife…
Link: https://lkml.kernel.org/r/20211109151501.4921-1-manfred@colorfullife.com
Fixes: ab602f79915 ("shm: make exit_shm work proportional to task activity")
Co-developed-by: Manfred Spraul <manfred(a)colorfullife.com>
Signed-off-by: Manfred Spraul <manfred(a)colorfullife.com>
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn(a)virtuozzo.com>
Cc: "Eric W. Biederman" <ebiederm(a)xmission.com>
Cc: Davidlohr Bueso <dave(a)stgolabs.net>
Cc: Greg KH <gregkh(a)linuxfoundation.org>
Cc: Andrei Vagin <avagin(a)gmail.com>
Cc: Pavel Tikhomirov <ptikhomirov(a)virtuozzo.com>
Cc: Vasily Averin <vvs(a)virtuozzo.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
diff --git a/include/linux/ipc_namespace.h b/include/linux/ipc_namespace.h
index 05e22770af51..b75395ec8d52 100644
--- a/include/linux/ipc_namespace.h
+++ b/include/linux/ipc_namespace.h
@@ -131,6 +131,16 @@ static inline struct ipc_namespace *get_ipc_ns(struct ipc_namespace *ns)
return ns;
}
+static inline struct ipc_namespace *get_ipc_ns_not_zero(struct ipc_namespace *ns)
+{
+ if (ns) {
+ if (refcount_inc_not_zero(&ns->ns.count))
+ return ns;
+ }
+
+ return NULL;
+}
+
extern void put_ipc_ns(struct ipc_namespace *ns);
#else
static inline struct ipc_namespace *copy_ipcs(unsigned long flags,
@@ -147,6 +157,11 @@ static inline struct ipc_namespace *get_ipc_ns(struct ipc_namespace *ns)
return ns;
}
+static inline struct ipc_namespace *get_ipc_ns_not_zero(struct ipc_namespace *ns)
+{
+ return ns;
+}
+
static inline void put_ipc_ns(struct ipc_namespace *ns)
{
}
diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h
index ba88a6987400..058d7f371e25 100644
--- a/include/linux/sched/task.h
+++ b/include/linux/sched/task.h
@@ -158,7 +158,7 @@ static inline struct vm_struct *task_stack_vm_area(const struct task_struct *t)
* Protects ->fs, ->files, ->mm, ->group_info, ->comm, keyring
* subscriptions and synchronises with wait4(). Also used in procfs. Also
* pins the final release of task.io_context. Also protects ->cpuset and
- * ->cgroup.subsys[]. And ->vfork_done.
+ * ->cgroup.subsys[]. And ->vfork_done. And ->sysvshm.shm_clist.
*
* Nests both inside and outside of read_lock(&tasklist_lock).
* It must not be nested with write_lock_irq(&tasklist_lock),
diff --git a/ipc/shm.c b/ipc/shm.c
index 4942bdd65748..b3048ebd5c31 100644
--- a/ipc/shm.c
+++ b/ipc/shm.c
@@ -62,9 +62,18 @@ struct shmid_kernel /* private to the kernel */
struct pid *shm_lprid;
struct ucounts *mlock_ucounts;
- /* The task created the shm object. NULL if the task is dead. */
+ /*
+ * The task created the shm object, for
+ * task_lock(shp->shm_creator)
+ */
struct task_struct *shm_creator;
- struct list_head shm_clist; /* list by creator */
+
+ /*
+ * List by creator. task_lock(->shm_creator) required for read/write.
+ * If list_empty(), then the creator is dead already.
+ */
+ struct list_head shm_clist;
+ struct ipc_namespace *ns;
} __randomize_layout;
/* shm_mode upper byte flags */
@@ -115,6 +124,7 @@ static void do_shm_rmid(struct ipc_namespace *ns, struct kern_ipc_perm *ipcp)
struct shmid_kernel *shp;
shp = container_of(ipcp, struct shmid_kernel, shm_perm);
+ WARN_ON(ns != shp->ns);
if (shp->shm_nattch) {
shp->shm_perm.mode |= SHM_DEST;
@@ -225,10 +235,43 @@ static void shm_rcu_free(struct rcu_head *head)
kfree(shp);
}
-static inline void shm_rmid(struct ipc_namespace *ns, struct shmid_kernel *s)
+/*
+ * It has to be called with shp locked.
+ * It must be called before ipc_rmid()
+ */
+static inline void shm_clist_rm(struct shmid_kernel *shp)
{
- list_del(&s->shm_clist);
- ipc_rmid(&shm_ids(ns), &s->shm_perm);
+ struct task_struct *creator;
+
+ /* ensure that shm_creator does not disappear */
+ rcu_read_lock();
+
+ /*
+ * A concurrent exit_shm may do a list_del_init() as well.
+ * Just do nothing if exit_shm already did the work
+ */
+ if (!list_empty(&shp->shm_clist)) {
+ /*
+ * shp->shm_creator is guaranteed to be valid *only*
+ * if shp->shm_clist is not empty.
+ */
+ creator = shp->shm_creator;
+
+ task_lock(creator);
+ /*
+ * list_del_init() is a nop if the entry was already removed
+ * from the list.
+ */
+ list_del_init(&shp->shm_clist);
+ task_unlock(creator);
+ }
+ rcu_read_unlock();
+}
+
+static inline void shm_rmid(struct shmid_kernel *s)
+{
+ shm_clist_rm(s);
+ ipc_rmid(&shm_ids(s->ns), &s->shm_perm);
}
@@ -283,7 +326,7 @@ static void shm_destroy(struct ipc_namespace *ns, struct shmid_kernel *shp)
shm_file = shp->shm_file;
shp->shm_file = NULL;
ns->shm_tot -= (shp->shm_segsz + PAGE_SIZE - 1) >> PAGE_SHIFT;
- shm_rmid(ns, shp);
+ shm_rmid(shp);
shm_unlock(shp);
if (!is_file_hugepages(shm_file))
shmem_lock(shm_file, 0, shp->mlock_ucounts);
@@ -303,10 +346,10 @@ static void shm_destroy(struct ipc_namespace *ns, struct shmid_kernel *shp)
*
* 2) sysctl kernel.shm_rmid_forced is set to 1.
*/
-static bool shm_may_destroy(struct ipc_namespace *ns, struct shmid_kernel *shp)
+static bool shm_may_destroy(struct shmid_kernel *shp)
{
return (shp->shm_nattch == 0) &&
- (ns->shm_rmid_forced ||
+ (shp->ns->shm_rmid_forced ||
(shp->shm_perm.mode & SHM_DEST));
}
@@ -337,7 +380,7 @@ static void shm_close(struct vm_area_struct *vma)
ipc_update_pid(&shp->shm_lprid, task_tgid(current));
shp->shm_dtim = ktime_get_real_seconds();
shp->shm_nattch--;
- if (shm_may_destroy(ns, shp))
+ if (shm_may_destroy(shp))
shm_destroy(ns, shp);
else
shm_unlock(shp);
@@ -358,10 +401,10 @@ static int shm_try_destroy_orphaned(int id, void *p, void *data)
*
* As shp->* are changed under rwsem, it's safe to skip shp locking.
*/
- if (shp->shm_creator != NULL)
+ if (!list_empty(&shp->shm_clist))
return 0;
- if (shm_may_destroy(ns, shp)) {
+ if (shm_may_destroy(shp)) {
shm_lock_by_ptr(shp);
shm_destroy(ns, shp);
}
@@ -379,48 +422,97 @@ void shm_destroy_orphaned(struct ipc_namespace *ns)
/* Locking assumes this will only be called with task == current */
void exit_shm(struct task_struct *task)
{
- struct ipc_namespace *ns = task->nsproxy->ipc_ns;
- struct shmid_kernel *shp, *n;
+ for (;;) {
+ struct shmid_kernel *shp;
+ struct ipc_namespace *ns;
- if (list_empty(&task->sysvshm.shm_clist))
- return;
+ task_lock(task);
+
+ if (list_empty(&task->sysvshm.shm_clist)) {
+ task_unlock(task);
+ break;
+ }
+
+ shp = list_first_entry(&task->sysvshm.shm_clist, struct shmid_kernel,
+ shm_clist);
- /*
- * If kernel.shm_rmid_forced is not set then only keep track of
- * which shmids are orphaned, so that a later set of the sysctl
- * can clean them up.
- */
- if (!ns->shm_rmid_forced) {
- down_read(&shm_ids(ns).rwsem);
- list_for_each_entry(shp, &task->sysvshm.shm_clist, shm_clist)
- shp->shm_creator = NULL;
/*
- * Only under read lock but we are only called on current
- * so no entry on the list will be shared.
+ * 1) Get pointer to the ipc namespace. It is worth to say
+ * that this pointer is guaranteed to be valid because
+ * shp lifetime is always shorter than namespace lifetime
+ * in which shp lives.
+ * We taken task_lock it means that shp won't be freed.
*/
- list_del(&task->sysvshm.shm_clist);
- up_read(&shm_ids(ns).rwsem);
- return;
- }
+ ns = shp->ns;
- /*
- * Destroy all already created segments, that were not yet mapped,
- * and mark any mapped as orphan to cover the sysctl toggling.
- * Destroy is skipped if shm_may_destroy() returns false.
- */
- down_write(&shm_ids(ns).rwsem);
- list_for_each_entry_safe(shp, n, &task->sysvshm.shm_clist, shm_clist) {
- shp->shm_creator = NULL;
+ /*
+ * 2) If kernel.shm_rmid_forced is not set then only keep track of
+ * which shmids are orphaned, so that a later set of the sysctl
+ * can clean them up.
+ */
+ if (!ns->shm_rmid_forced)
+ goto unlink_continue;
- if (shm_may_destroy(ns, shp)) {
- shm_lock_by_ptr(shp);
- shm_destroy(ns, shp);
+ /*
+ * 3) get a reference to the namespace.
+ * The refcount could be already 0. If it is 0, then
+ * the shm objects will be free by free_ipc_work().
+ */
+ ns = get_ipc_ns_not_zero(ns);
+ if (!ns) {
+unlink_continue:
+ list_del_init(&shp->shm_clist);
+ task_unlock(task);
+ continue;
}
- }
- /* Remove the list head from any segments still attached. */
- list_del(&task->sysvshm.shm_clist);
- up_write(&shm_ids(ns).rwsem);
+ /*
+ * 4) get a reference to shp.
+ * This cannot fail: shm_clist_rm() is called before
+ * ipc_rmid(), thus the refcount cannot be 0.
+ */
+ WARN_ON(!ipc_rcu_getref(&shp->shm_perm));
+
+ /*
+ * 5) unlink the shm segment from the list of segments
+ * created by current.
+ * This must be done last. After unlinking,
+ * only the refcounts obtained above prevent IPC_RMID
+ * from destroying the segment or the namespace.
+ */
+ list_del_init(&shp->shm_clist);
+
+ task_unlock(task);
+
+ /*
+ * 6) we have all references
+ * Thus lock & if needed destroy shp.
+ */
+ down_write(&shm_ids(ns).rwsem);
+ shm_lock_by_ptr(shp);
+ /*
+ * rcu_read_lock was implicitly taken in shm_lock_by_ptr, it's
+ * safe to call ipc_rcu_putref here
+ */
+ ipc_rcu_putref(&shp->shm_perm, shm_rcu_free);
+
+ if (ipc_valid_object(&shp->shm_perm)) {
+ if (shm_may_destroy(shp))
+ shm_destroy(ns, shp);
+ else
+ shm_unlock(shp);
+ } else {
+ /*
+ * Someone else deleted the shp from namespace
+ * idr/kht while we have waited.
+ * Just unlock and continue.
+ */
+ shm_unlock(shp);
+ }
+
+ up_write(&shm_ids(ns).rwsem);
+ put_ipc_ns(ns); /* paired with get_ipc_ns_not_zero */
+ }
}
static vm_fault_t shm_fault(struct vm_fault *vmf)
@@ -676,7 +768,11 @@ static int newseg(struct ipc_namespace *ns, struct ipc_params *params)
if (error < 0)
goto no_id;
+ shp->ns = ns;
+
+ task_lock(current);
list_add(&shp->shm_clist, ¤t->sysvshm.shm_clist);
+ task_unlock(current);
/*
* shmid gets reported as "inode#" in /proc/pid/maps.
@@ -1567,7 +1663,8 @@ long do_shmat(int shmid, char __user *shmaddr, int shmflg,
down_write(&shm_ids(ns).rwsem);
shp = shm_lock(ns, shmid);
shp->shm_nattch--;
- if (shm_may_destroy(ns, shp))
+
+ if (shm_may_destroy(shp))
shm_destroy(ns, shp);
else
shm_unlock(shp);
The patch titled
Subject: timers: implement usleep_idle_range()
has been added to the -mm tree. Its filename is
timers-implement-usleep_idle_range.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/timers-implement-usleep_idle_rang…
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/timers-implement-usleep_idle_rang…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: SeongJae Park <sj(a)kernel.org>
Subject: timers: implement usleep_idle_range()
Patch series "mm/damon: Fix fake /proc/loadavg reports", v3.
This patchset fixes DAMON's fake load report issue. The first patch makes
yet another variant of usleep_range() for this fix, and the second patch
fixes the issue of DAMON by making it using the newly introduced function.
This patch (of 2):
Some kernel threads such as DAMON could need to repeatedly sleep in micro
seconds level. Because usleep_range() sleeps in uninterruptible state,
however, such threads would make /proc/loadavg reports fake load.
To help such cases, this commit implements a variant of usleep_range()
called usleep_idle_range(). It is same to usleep_range() but sets the
state of the current task as TASK_IDLE while sleeping.
Link: https://lkml.kernel.org/r/20211126145015.15862-1-sj@kernel.org
Link: https://lkml.kernel.org/r/20211126145015.15862-2-sj@kernel.org
Signed-off-by: SeongJae Park <sj(a)kernel.org>
Suggested-by: Andrew Morton <akpm(a)linux-foundation.org>
Reviewed-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: John Stultz <john.stultz(a)linaro.org>
Cc: Oleksandr Natalenko <oleksandr(a)natalenko.name>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/delay.h | 14 +++++++++++++-
kernel/time/timer.c | 16 +++++++++-------
2 files changed, 22 insertions(+), 8 deletions(-)
--- a/include/linux/delay.h~timers-implement-usleep_idle_range
+++ a/include/linux/delay.h
@@ -20,6 +20,7 @@
*/
#include <linux/math.h>
+#include <linux/sched.h>
extern unsigned long loops_per_jiffy;
@@ -58,7 +59,18 @@ void calibrate_delay(void);
void __attribute__((weak)) calibration_delay_done(void);
void msleep(unsigned int msecs);
unsigned long msleep_interruptible(unsigned int msecs);
-void usleep_range(unsigned long min, unsigned long max);
+void usleep_range_state(unsigned long min, unsigned long max,
+ unsigned int state);
+
+static inline void usleep_range(unsigned long min, unsigned long max)
+{
+ usleep_range_state(min, max, TASK_UNINTERRUPTIBLE);
+}
+
+static inline void usleep_idle_range(unsigned long min, unsigned long max)
+{
+ usleep_range_state(min, max, TASK_IDLE);
+}
static inline void ssleep(unsigned int seconds)
{
--- a/kernel/time/timer.c~timers-implement-usleep_idle_range
+++ a/kernel/time/timer.c
@@ -2054,26 +2054,28 @@ unsigned long msleep_interruptible(unsig
EXPORT_SYMBOL(msleep_interruptible);
/**
- * usleep_range - Sleep for an approximate time
- * @min: Minimum time in usecs to sleep
- * @max: Maximum time in usecs to sleep
+ * usleep_range_state - Sleep for an approximate time in a given state
+ * @min: Minimum time in usecs to sleep
+ * @max: Maximum time in usecs to sleep
+ * @state: State of the current task that will be while sleeping
*
* In non-atomic context where the exact wakeup time is flexible, use
- * usleep_range() instead of udelay(). The sleep improves responsiveness
+ * usleep_range_state() instead of udelay(). The sleep improves responsiveness
* by avoiding the CPU-hogging busy-wait of udelay(), and the range reduces
* power usage by allowing hrtimers to take advantage of an already-
* scheduled interrupt instead of scheduling a new one just for this sleep.
*/
-void __sched usleep_range(unsigned long min, unsigned long max)
+void __sched usleep_range_state(unsigned long min, unsigned long max,
+ unsigned int state)
{
ktime_t exp = ktime_add_us(ktime_get(), min);
u64 delta = (u64)(max - min) * NSEC_PER_USEC;
for (;;) {
- __set_current_state(TASK_UNINTERRUPTIBLE);
+ __set_current_state(state);
/* Do not return before the requested sleep time has elapsed */
if (!schedule_hrtimeout_range(&exp, delta, HRTIMER_MODE_ABS))
break;
}
}
-EXPORT_SYMBOL(usleep_range);
+EXPORT_SYMBOL(usleep_range_state);
_
Patches currently in -mm which might be from sj(a)kernel.org are
timers-implement-usleep_idle_range.patch
mm-damon-core-fix-fake-load-reports-due-to-uninterruptible-sleeps.patch
mm-damon-remove-some-no-need-func-definitions-in-damonh-file-fix.patch
If APICv is disabled for this vCPU, assigned devices may
still attempt to post interrupts. In that case, we need
to cancel the vmentry and deliver the interrupt with
KVM_REQ_EVENT. Extend the existing code that handles
injection of L1 interrupts into L2 to cover this case
as well.
vmx_hwapic_irr_update is only called when APICv is active
so it would be confusing to add a check for
vcpu->arch.apicv_active in there. Instead, just use
vmx_set_rvi directly in vmx_sync_pir_to_irr.
Cc: stable(a)vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
---
arch/x86/kvm/vmx/vmx.c | 35 +++++++++++++++++++++++------------
1 file changed, 23 insertions(+), 12 deletions(-)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index ba66c171d951..cccf1eab58ac 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6264,7 +6264,7 @@ static int vmx_sync_pir_to_irr(struct kvm_vcpu *vcpu)
int max_irr;
bool max_irr_updated;
- if (KVM_BUG_ON(!vcpu->arch.apicv_active, vcpu->kvm))
+ if (KVM_BUG_ON(!enable_apicv, vcpu->kvm))
return -EIO;
if (pi_test_on(&vmx->pi_desc)) {
@@ -6276,20 +6276,31 @@ static int vmx_sync_pir_to_irr(struct kvm_vcpu *vcpu)
smp_mb__after_atomic();
max_irr_updated =
kvm_apic_update_irr(vcpu, vmx->pi_desc.pir, &max_irr);
-
- /*
- * If we are running L2 and L1 has a new pending interrupt
- * which can be injected, this may cause a vmexit or it may
- * be injected into L2. Either way, this interrupt will be
- * processed via KVM_REQ_EVENT, not RVI, because we do not use
- * virtual interrupt delivery to inject L1 interrupts into L2.
- */
- if (is_guest_mode(vcpu) && max_irr_updated)
- kvm_make_request(KVM_REQ_EVENT, vcpu);
} else {
max_irr = kvm_lapic_find_highest_irr(vcpu);
+ max_irr_updated = false;
}
- vmx_hwapic_irr_update(vcpu, max_irr);
+
+ /*
+ * If virtual interrupt delivery is not in use, the interrupt
+ * will be processed via KVM_REQ_EVENT, not RVI. This can happen
+ * in two cases:
+ *
+ * 1) If we are running L2 and L1 has a new pending interrupt
+ * which can be injected, this may cause a vmexit or it may
+ * be injected into L2. We do not use virtual interrupt
+ * delivery to inject L1 interrupts into L2.
+ *
+ * 2) If APICv is disabled for this vCPU, assigned devices may
+ * still attempt to post interrupts. The posted interrupt
+ * vector will cause a vmexit and the subsequent entry will
+ * call sync_pir_to_irr.
+ */
+ if (!is_guest_mode(vcpu) && vcpu->arch.apicv_active)
+ vmx_set_rvi(max_irr);
+ else if (max_irr_updated)
+ kvm_make_request(KVM_REQ_EVENT, vcpu);
+
return max_irr;
}
--
2.27.0
Check out this report and any autotriaged failures in our web dashboard:
https://datawarehouse.cki-project.org/kcidb/checkouts/25652
Hello,
We ran automated tests on a recent commit from this kernel tree:
Kernel repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
Commit: 83e74b927dce - block: avoid to quiesce queue in elevator_init_mq
The results of these automated tests are provided below.
Overall result: PASSED
Merge: OK
Compile: OK
Tests: OK
Targeted tests: NO
All kernel binaries, config files, and logs are available for download here:
https://arr-cki-prod-datawarehouse-public.s3.amazonaws.com/index.html?prefi…
Please reply to this email if you have any questions about the tests that we
ran or if you have any suggestions on how to make future tests more effective.
,-. ,-.
( C ) ( K ) Continuous
`-',-.`-' Kernel
( I ) Integration
`-'
______________________________________________________________________________
Compile testing
---------------
We compiled the kernel for 4 architectures:
aarch64:
make options: make -j24 INSTALL_MOD_STRIP=1 targz-pkg
ppc64le:
make options: make -j24 INSTALL_MOD_STRIP=1 targz-pkg
s390x:
make options: make -j24 INSTALL_MOD_STRIP=1 targz-pkg
x86_64:
make options: make -j24 INSTALL_MOD_STRIP=1 targz-pkg
Hardware testing
----------------
We booted each kernel and ran the following tests:
aarch64:
Host 1:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ Reboot test
⚡⚡⚡ xfstests - ext4
⚡⚡⚡ xfstests - xfs
⚡⚡⚡ IPMI driver test
⚡⚡⚡ IPMItool loop stress test
⚡⚡⚡ selinux-policy: serge-testsuite
⚡⚡⚡ Storage blktests - blk
⚡⚡⚡ Storage block - filesystem fio test
⚡⚡⚡ Storage block - queue scheduler test
⚡⚡⚡ storage: software RAID testing
⚡⚡⚡ Storage: swraid mdadm raid_module test
⚡⚡⚡ stress: stress-ng - interrupt
⚡⚡⚡ stress: stress-ng - cpu
⚡⚡⚡ stress: stress-ng - cpu-cache
⚡⚡⚡ stress: stress-ng - memory
🚧 ⚡⚡⚡ Podman system test - as root
🚧 ⚡⚡⚡ Podman system test - as user
🚧 ⚡⚡⚡ xfstests - btrfs
🚧 ⚡⚡⚡ Storage blktests - nvme-tcp
🚧 ⚡⚡⚡ lvm cache test
🚧 ⚡⚡⚡ stress: stress-ng - os
Host 2:
✅ Boot test
✅ Reboot test
✅ ACPI table test
✅ ACPI enabled test
✅ LTP - cve
✅ LTP - sched
✅ LTP - syscalls
✅ LTP - can
✅ LTP - commands
✅ LTP - containers
✅ LTP - dio
✅ LTP - fs
✅ LTP - fsx
✅ LTP - math
✅ LTP - hugetlb
✅ LTP - mm
✅ LTP - nptl
✅ LTP - pty
✅ LTP - ipc
✅ LTP - tracing
✅ LTP: openposix test suite
✅ CIFS Connectathon
✅ POSIX pjd-fstest suites
✅ NFS Connectathon
✅ Loopdev Sanity
✅ jvm - jcstress tests
✅ Memory: fork_mem
✅ Memory function: memfd_create
✅ AMTU (Abstract Machine Test Utility)
✅ Networking bridge: sanity
✅ Ethernet drivers sanity
✅ Networking socket: fuzz
✅ Networking route: pmtu
✅ Networking route_func - local
✅ Networking route_func - forward
✅ Networking TCP: keepalive test
✅ Networking UDP: socket
✅ Networking cki netfilter test
✅ Networking tunnel: geneve basic test
✅ Networking tunnel: gre basic
✅ L2TP basic test
✅ Networking tunnel: vxlan basic
✅ Networking ipsec: basic netns - transport
✅ Networking ipsec: basic netns - tunnel
✅ Libkcapi AF_ALG test
✅ pciutils: update pci ids test
✅ ALSA PCM loopback test
✅ ALSA Control (mixer) Userspace Element test
✅ storage: dm/common
✅ lvm snapper test
✅ storage: SCSI VPD
✅ trace: ftrace/tracer
🚧 ✅ xarray-idr-radixtree-test
🚧 ✅ i2c: i2cdetect sanity
🚧 ✅ Firmware test suite
🚧 ✅ Memory function: kaslr
🚧 ✅ Networking: igmp conformance test
🚧 ✅ audit: audit testsuite test
Host 3:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ Reboot test
🚧 ⚡⚡⚡ Storage blktests - srp
Host 4:
✅ Boot test
✅ Reboot test
🚧 ✅ Storage blktests - nvmeof-mp
Host 5:
✅ Boot test
✅ Reboot test
✅ Networking bridge: sanity - mlx5
✅ Ethernet drivers sanity - mlx5
Host 6:
✅ Boot test
✅ Reboot test
✅ xfstests - ext4
✅ xfstests - xfs
✅ IPMI driver test
✅ IPMItool loop stress test
✅ selinux-policy: serge-testsuite
✅ Storage blktests - blk
✅ Storage block - filesystem fio test
✅ Storage block - queue scheduler test
✅ storage: software RAID testing
✅ Storage: swraid mdadm raid_module test
✅ stress: stress-ng - interrupt
✅ stress: stress-ng - cpu
✅ stress: stress-ng - cpu-cache
✅ stress: stress-ng - memory
🚧 ❌ Podman system test - as root
🚧 ✅ Podman system test - as user
🚧 ✅ xfstests - btrfs
🚧 ✅ Storage blktests - nvme-tcp
🚧 ✅ lvm cache test
🚧 ✅ stress: stress-ng - os
Host 7:
✅ Boot test
✅ Reboot test
🚧 ❌ Storage blktests - srp
ppc64le:
Host 1:
✅ Boot test
✅ Reboot test
✅ LTP - cve
✅ LTP - sched
✅ LTP - syscalls
✅ LTP - can
✅ LTP - commands
✅ LTP - containers
✅ LTP - dio
✅ LTP - fs
✅ LTP - fsx
✅ LTP - math
✅ LTP - hugetlb
✅ LTP - mm
✅ LTP - nptl
✅ LTP - pty
✅ LTP - ipc
✅ LTP - tracing
✅ LTP: openposix test suite
✅ CIFS Connectathon
✅ POSIX pjd-fstest suites
✅ NFS Connectathon
✅ Loopdev Sanity
✅ jvm - jcstress tests
✅ Memory: fork_mem
✅ Memory function: memfd_create
✅ AMTU (Abstract Machine Test Utility)
✅ Networking bridge: sanity
✅ Ethernet drivers sanity
✅ Networking socket: fuzz
✅ Networking route: pmtu
✅ Networking route_func - local
✅ Networking route_func - forward
✅ Networking TCP: keepalive test
✅ Networking UDP: socket
✅ Networking cki netfilter test
✅ Networking tunnel: geneve basic test
✅ Networking tunnel: gre basic
✅ L2TP basic test
✅ Networking tunnel: vxlan basic
✅ Networking ipsec: basic netns - tunnel
✅ Libkcapi AF_ALG test
✅ pciutils: update pci ids test
✅ ALSA PCM loopback test
✅ ALSA Control (mixer) Userspace Element test
✅ storage: dm/common
✅ lvm snapper test
✅ trace: ftrace/tracer
🚧 ✅ xarray-idr-radixtree-test
🚧 ✅ Memory function: kaslr
🚧 ✅ audit: audit testsuite test
Host 2:
✅ Boot test
✅ Reboot test
🚧 ✅ Storage blktests - srp
Host 3:
✅ Boot test
✅ Reboot test
✅ xfstests - ext4
✅ xfstests - xfs
✅ IPMI driver test
✅ IPMItool loop stress test
✅ selinux-policy: serge-testsuite
✅ Storage blktests - blk
✅ Storage block - filesystem fio test
✅ Storage block - queue scheduler test
✅ storage: software RAID testing
✅ Storage: swraid mdadm raid_module test
🚧 ✅ Podman system test - as root
🚧 ✅ Podman system test - as user
🚧 ✅ xfstests - btrfs
🚧 ✅ Storage blktests - nvme-tcp
🚧 ✅ Storage: lvm device-mapper test - upstream
🚧 ✅ lvm cache test
Host 4:
✅ Boot test
✅ Reboot test
🚧 ✅ Storage blktests - nvmeof-mp
s390x:
Host 1:
✅ Boot test
✅ Reboot test
🚧 ✅ Storage blktests - nvmeof-mp
Host 2:
✅ Boot test
✅ Reboot test
✅ selinux-policy: serge-testsuite
✅ Storage blktests - blk
✅ Storage: swraid mdadm raid_module test
✅ stress: stress-ng - interrupt
✅ stress: stress-ng - cpu
✅ stress: stress-ng - cpu-cache
✅ stress: stress-ng - memory
🚧 ✅ Podman system test - as root
🚧 ❌ Podman system test - as user
🚧 ✅ Storage blktests - nvme-tcp
🚧 ✅ lvm cache test
🚧 ✅ stress: stress-ng - os
Host 3:
✅ Boot test
✅ Reboot test
🚧 ✅ Storage blktests - srp
Host 4:
✅ Boot test
✅ Reboot test
✅ LTP - cve
✅ LTP - sched
✅ LTP - syscalls
✅ LTP - can
✅ LTP - commands
✅ LTP - containers
✅ LTP - dio
✅ LTP - fs
✅ LTP - fsx
✅ LTP - math
✅ LTP - hugetlb
✅ LTP - mm
✅ LTP - nptl
✅ LTP - pty
✅ LTP - ipc
✅ LTP - tracing
✅ LTP: openposix test suite
✅ CIFS Connectathon
✅ POSIX pjd-fstest suites
✅ NFS Connectathon
✅ Loopdev Sanity
✅ jvm - jcstress tests
✅ Memory: fork_mem
✅ Memory function: memfd_create
✅ AMTU (Abstract Machine Test Utility)
✅ Networking bridge: sanity
✅ Ethernet drivers sanity
✅ Networking route: pmtu
✅ Networking route_func - local
✅ Networking route_func - forward
✅ Networking TCP: keepalive test
✅ Networking UDP: socket
✅ Networking cki netfilter test
✅ Networking tunnel: geneve basic test
✅ Networking tunnel: gre basic
✅ L2TP basic test
✅ Networking tunnel: vxlan basic
✅ Networking ipsec: basic netns - transport
✅ Networking ipsec: basic netns - tunnel
✅ Libkcapi AF_ALG test
✅ storage: dm/common
✅ lvm snapper test
✅ trace: ftrace/tracer
🚧 ✅ xarray-idr-radixtree-test
🚧 ✅ Memory function: kaslr
🚧 ✅ audit: audit testsuite test
x86_64:
Host 1:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
✅ Boot test
✅ Reboot test
✅ ACPI table test
✅ LTP - cve
✅ LTP - sched
⚡⚡⚡ LTP - syscalls
⚡⚡⚡ LTP - can
⚡⚡⚡ LTP - commands
⚡⚡⚡ LTP - containers
⚡⚡⚡ LTP - dio
⚡⚡⚡ LTP - fs
⚡⚡⚡ LTP - fsx
⚡⚡⚡ LTP - math
⚡⚡⚡ LTP - hugetlb
⚡⚡⚡ LTP - mm
⚡⚡⚡ LTP - nptl
⚡⚡⚡ LTP - pty
⚡⚡⚡ LTP - ipc
⚡⚡⚡ LTP - tracing
⚡⚡⚡ LTP: openposix test suite
⚡⚡⚡ CIFS Connectathon
⚡⚡⚡ POSIX pjd-fstest suites
⚡⚡⚡ NFS Connectathon
⚡⚡⚡ Loopdev Sanity
⚡⚡⚡ jvm - jcstress tests
⚡⚡⚡ Memory: fork_mem
⚡⚡⚡ Memory function: memfd_create
⚡⚡⚡ AMTU (Abstract Machine Test Utility)
⚡⚡⚡ Networking bridge: sanity
⚡⚡⚡ Ethernet drivers sanity
⚡⚡⚡ Networking socket: fuzz
⚡⚡⚡ Networking route: pmtu
⚡⚡⚡ Networking route_func - local
⚡⚡⚡ Networking route_func - forward
⚡⚡⚡ Networking TCP: keepalive test
⚡⚡⚡ Networking UDP: socket
⚡⚡⚡ Networking cki netfilter test
⚡⚡⚡ Networking tunnel: geneve basic test
⚡⚡⚡ Networking tunnel: gre basic
⚡⚡⚡ L2TP basic test
⚡⚡⚡ Networking tunnel: vxlan basic
⚡⚡⚡ Networking ipsec: basic netns - transport
⚡⚡⚡ Networking ipsec: basic netns - tunnel
⚡⚡⚡ Libkcapi AF_ALG test
⚡⚡⚡ pciutils: sanity smoke test
⚡⚡⚡ pciutils: update pci ids test
⚡⚡⚡ ALSA PCM loopback test
⚡⚡⚡ ALSA Control (mixer) Userspace Element test
⚡⚡⚡ storage: dm/common
⚡⚡⚡ lvm snapper test
⚡⚡⚡ storage: SCSI VPD
⚡⚡⚡ trace: ftrace/tracer
🚧 ⚡⚡⚡ xarray-idr-radixtree-test
🚧 ⚡⚡⚡ i2c: i2cdetect sanity
🚧 ⚡⚡⚡ Firmware test suite
🚧 ⚡⚡⚡ Memory function: kaslr
🚧 ⚡⚡⚡ Networking: igmp conformance test
🚧 ⚡⚡⚡ audit: audit testsuite test
Host 2:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ Reboot test
⚡⚡⚡ xfstests - ext4
⚡⚡⚡ xfstests - xfs
⚡⚡⚡ xfstests - nfsv4.2
⚡⚡⚡ xfstests - cifsv3.11
⚡⚡⚡ IPMI driver test
⚡⚡⚡ IPMItool loop stress test
⚡⚡⚡ selinux-policy: serge-testsuite
⚡⚡⚡ power-management: cpupower/sanity test
⚡⚡⚡ Storage blktests - blk
⚡⚡⚡ Storage block - filesystem fio test
⚡⚡⚡ Storage block - queue scheduler test
⚡⚡⚡ storage: software RAID testing
⚡⚡⚡ Storage: swraid mdadm raid_module test
⚡⚡⚡ stress: stress-ng - interrupt
⚡⚡⚡ stress: stress-ng - cpu
⚡⚡⚡ stress: stress-ng - cpu-cache
⚡⚡⚡ stress: stress-ng - memory
🚧 ⚡⚡⚡ Podman system test - as root
🚧 ⚡⚡⚡ Podman system test - as user
🚧 ⚡⚡⚡ CPU: Idle Test
🚧 ⚡⚡⚡ xfstests - btrfs
🚧 ⚡⚡⚡ Storage blktests - nvme-tcp
🚧 ⚡⚡⚡ Storage: lvm device-mapper test - upstream
🚧 ⚡⚡⚡ lvm cache test
🚧 ⚡⚡⚡ stress: stress-ng - os
Host 3:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ Reboot test
🚧 ⚡⚡⚡ Storage blktests - srp
Host 4:
✅ Boot test
✅ Reboot test
🚧 ✅ Storage blktests - nvmeof-mp
Host 5:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ Reboot test
🚧 ⚡⚡⚡ Storage blktests - srp
Host 6:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ Reboot test
🚧 ⚡⚡⚡ Storage blktests - srp
Test sources: https://gitlab.com/cki-project/kernel-tests
💚 Pull requests are welcome for new tests or improvements to existing tests!
Aborted tests
-------------
Tests that didn't complete running successfully are marked with ⚡⚡⚡.
If this was caused by an infrastructure issue, we try to mark that
explicitly in the report.
Waived tests
------------
If the test run included waived tests, they are marked with 🚧. Such tests are
executed but their results are not taken into account. Tests are waived when
their results are not reliable enough, e.g. when they're just introduced or are
being fixed.
Testing timeout
---------------
We aim to provide a report within reasonable timeframe. Tests that haven't
finished running yet are marked with ⏱.
Targeted tests
--------------
Test runs for patches always include a set of base tests, plus some
tests chosen based on the file paths modified by the patch. The latter
are called "targeted tests". If no targeted tests are run, that means
no patch-specific tests are available. Please, consider contributing a
targeted test for related patches to increase test coverage. See
https://docs.engineering.redhat.com/x/_wEZB for more details.
Currently, checks for whether VT-d PI can be used refer to the current
status of the feature in the current vCPU; or they more or less pick
vCPU 0 in case a specific vCPU is not available.
However, these checks do not attempt to synchronize with changes to
the IRTE. In particular, there is no path that updates the IRTE when
APICv is re-activated on vCPU 0; and there is no path to wakeup a CPU
that has APICv disabled, if the wakeup occurs because of an IRTE
that points to a posted interrupt.
To fix this, always go through the VT-d PI path as long as there are
assigned devices and APICv is available on both the host and the VM side.
Since the relevant condition was copied over three times, take the hint
and factor it into a separate function.
Suggested-by: Sean Christopherson <seanjc(a)google.com>
Cc: stable(a)vger.kernel.org
Reviewed-by: Sean Christopherson <seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
---
arch/x86/kvm/vmx/posted_intr.c | 20 +++++++++++---------
1 file changed, 11 insertions(+), 9 deletions(-)
diff --git a/arch/x86/kvm/vmx/posted_intr.c b/arch/x86/kvm/vmx/posted_intr.c
index 5f81ef092bd4..1c94783b5a54 100644
--- a/arch/x86/kvm/vmx/posted_intr.c
+++ b/arch/x86/kvm/vmx/posted_intr.c
@@ -5,6 +5,7 @@
#include <asm/cpu.h>
#include "lapic.h"
+#include "irq.h"
#include "posted_intr.h"
#include "trace.h"
#include "vmx.h"
@@ -77,13 +78,18 @@ void vmx_vcpu_pi_load(struct kvm_vcpu *vcpu, int cpu)
pi_set_on(pi_desc);
}
+static bool vmx_can_use_vtd_pi(struct kvm *kvm)
+{
+ return irqchip_in_kernel(kvm) && enable_apicv &&
+ kvm_arch_has_assigned_device(kvm) &&
+ irq_remapping_cap(IRQ_POSTING_CAP);
+}
+
void vmx_vcpu_pi_put(struct kvm_vcpu *vcpu)
{
struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
- if (!kvm_arch_has_assigned_device(vcpu->kvm) ||
- !irq_remapping_cap(IRQ_POSTING_CAP) ||
- !kvm_vcpu_apicv_active(vcpu))
+ if (!vmx_can_use_vtd_pi(vcpu->kvm))
return;
/* Set SN when the vCPU is preempted */
@@ -141,9 +147,7 @@ int pi_pre_block(struct kvm_vcpu *vcpu)
struct pi_desc old, new;
struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
- if (!kvm_arch_has_assigned_device(vcpu->kvm) ||
- !irq_remapping_cap(IRQ_POSTING_CAP) ||
- !kvm_vcpu_apicv_active(vcpu))
+ if (!vmx_can_use_vtd_pi(vcpu->kvm))
return 0;
WARN_ON(irqs_disabled());
@@ -270,9 +274,7 @@ int pi_update_irte(struct kvm *kvm, unsigned int host_irq, uint32_t guest_irq,
struct vcpu_data vcpu_info;
int idx, ret = 0;
- if (!kvm_arch_has_assigned_device(kvm) ||
- !irq_remapping_cap(IRQ_POSTING_CAP) ||
- !kvm_vcpu_apicv_active(kvm->vcpus[0]))
+ if (!vmx_can_use_vtd_pi(kvm))
return 0;
idx = srcu_read_lock(&kvm->irq_srcu);
--
2.27.0
The IRTE for an assigned device can trigger a POSTED_INTR_VECTOR even
if APICv is disabled on the vCPU that receives it. In that case, the
interrupt will just cause a vmexit and leave the ON bit set together
with the PIR bit corresponding to the interrupt.
Right now, the interrupt would not be delivered until APICv is re-enabled.
However, fixing this is just a matter of always doing the PIR->IRR
synchronization, even if the vCPU has temporarily disabled APICv.
This is not a problem for performance, or if anything it is an
improvement. First, in the common case where vcpu->arch.apicv_active is
true, one fewer check has to be performed. Second, static_call_cond will
elide the function call if APICv is not present or disabled. Finally,
in the case for AMD hardware we can remove the sync_pir_to_irr callback:
it is only needed for apic_has_interrupt_for_ppr, and that function
already has a fallback for !APICv.
Cc: stable(a)vger.kernel.org
Co-developed-by: Sean Christopherson <seanjc(a)google.com>
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
---
arch/x86/kvm/lapic.c | 2 +-
arch/x86/kvm/svm/svm.c | 1 -
arch/x86/kvm/x86.c | 18 +++++++++---------
3 files changed, 10 insertions(+), 11 deletions(-)
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 759952dd1222..f206fc35deff 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -707,7 +707,7 @@ static void pv_eoi_clr_pending(struct kvm_vcpu *vcpu)
static int apic_has_interrupt_for_ppr(struct kvm_lapic *apic, u32 ppr)
{
int highest_irr;
- if (apic->vcpu->arch.apicv_active)
+ if (kvm_x86_ops.sync_pir_to_irr)
highest_irr = static_call(kvm_x86_sync_pir_to_irr)(apic->vcpu);
else
highest_irr = apic_find_highest_irr(apic);
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 5630c241d5f6..d0f68d11ec70 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4651,7 +4651,6 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
.load_eoi_exitmap = svm_load_eoi_exitmap,
.hwapic_irr_update = svm_hwapic_irr_update,
.hwapic_isr_update = svm_hwapic_isr_update,
- .sync_pir_to_irr = kvm_lapic_find_highest_irr,
.apicv_post_state_restore = avic_post_state_restore,
.set_tss_addr = svm_set_tss_addr,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 441f4769173e..a8f12c83db4b 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4448,8 +4448,7 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
static int kvm_vcpu_ioctl_get_lapic(struct kvm_vcpu *vcpu,
struct kvm_lapic_state *s)
{
- if (vcpu->arch.apicv_active)
- static_call(kvm_x86_sync_pir_to_irr)(vcpu);
+ static_call_cond(kvm_x86_sync_pir_to_irr)(vcpu);
return kvm_apic_get_state(vcpu, s);
}
@@ -9528,8 +9527,7 @@ static void vcpu_scan_ioapic(struct kvm_vcpu *vcpu)
if (irqchip_split(vcpu->kvm))
kvm_scan_ioapic_routes(vcpu, vcpu->arch.ioapic_handled_vectors);
else {
- if (vcpu->arch.apicv_active)
- static_call(kvm_x86_sync_pir_to_irr)(vcpu);
+ static_call_cond(kvm_x86_sync_pir_to_irr)(vcpu);
if (ioapic_in_kernel(vcpu->kvm))
kvm_ioapic_scan_entry(vcpu, vcpu->arch.ioapic_handled_vectors);
}
@@ -9802,10 +9800,12 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
/*
* This handles the case where a posted interrupt was
- * notified with kvm_vcpu_kick.
+ * notified with kvm_vcpu_kick. Assigned devices can
+ * use the POSTED_INTR_VECTOR even if APICv is disabled,
+ * so do it even if APICv is disabled on this vCPU.
*/
- if (kvm_lapic_enabled(vcpu) && vcpu->arch.apicv_active)
- static_call(kvm_x86_sync_pir_to_irr)(vcpu);
+ if (kvm_lapic_enabled(vcpu))
+ static_call_cond(kvm_x86_sync_pir_to_irr)(vcpu);
if (kvm_vcpu_exit_request(vcpu)) {
vcpu->mode = OUTSIDE_GUEST_MODE;
@@ -9849,8 +9849,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
if (likely(exit_fastpath != EXIT_FASTPATH_REENTER_GUEST))
break;
- if (kvm_lapic_enabled(vcpu) && vcpu->arch.apicv_active)
- static_call(kvm_x86_sync_pir_to_irr)(vcpu);
+ if (kvm_lapic_enabled(vcpu))
+ static_call_cond(kvm_x86_sync_pir_to_irr)(vcpu);
if (unlikely(kvm_vcpu_exit_request(vcpu))) {
exit_fastpath = EXIT_FASTPATH_EXIT_HANDLED;
--
2.27.0
Synchronize the condition for the two calls to kvm_x86_sync_pir_to_irr.
The one in the reenter-guest fast path invoked the callback
unconditionally even if LAPIC is disabled.
Cc: stable(a)vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
---
arch/x86/kvm/x86.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 5a403d92833f..441f4769173e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9849,7 +9849,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
if (likely(exit_fastpath != EXIT_FASTPATH_REENTER_GUEST))
break;
- if (vcpu->arch.apicv_active)
+ if (kvm_lapic_enabled(vcpu) && vcpu->arch.apicv_active)
static_call(kvm_x86_sync_pir_to_irr)(vcpu);
if (unlikely(kvm_vcpu_exit_request(vcpu))) {
--
2.27.0
Check out this report and any autotriaged failures in our web dashboard:
https://datawarehouse.cki-project.org/kcidb/checkouts/25645
Hello,
We ran automated tests on a recent commit from this kernel tree:
Kernel repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
Commit: 401eae66e1aa - block: avoid to quiesce queue in elevator_init_mq
The results of these automated tests are provided below.
Overall result: PASSED
Merge: OK
Compile: OK
Tests: OK
Targeted tests: NO
All kernel binaries, config files, and logs are available for download here:
https://arr-cki-prod-datawarehouse-public.s3.amazonaws.com/index.html?prefi…
Please reply to this email if you have any questions about the tests that we
ran or if you have any suggestions on how to make future tests more effective.
,-. ,-.
( C ) ( K ) Continuous
`-',-.`-' Kernel
( I ) Integration
`-'
______________________________________________________________________________
Compile testing
---------------
We compiled the kernel for 4 architectures:
aarch64:
make options: make -j24 INSTALL_MOD_STRIP=1 targz-pkg
ppc64le:
make options: make -j24 INSTALL_MOD_STRIP=1 targz-pkg
s390x:
make options: make -j24 INSTALL_MOD_STRIP=1 targz-pkg
x86_64:
make options: make -j24 INSTALL_MOD_STRIP=1 targz-pkg
Hardware testing
----------------
We booted each kernel and ran the following tests:
aarch64:
Host 1:
✅ Boot test
✅ Reboot test
✅ ACPI table test
✅ ACPI enabled test
✅ LTP - cve
✅ LTP - sched
✅ LTP - syscalls
✅ LTP - can
✅ LTP - commands
✅ LTP - containers
✅ LTP - dio
✅ LTP - fs
✅ LTP - fsx
✅ LTP - math
✅ LTP - hugetlb
✅ LTP - mm
✅ LTP - nptl
✅ LTP - pty
✅ LTP - ipc
✅ LTP - tracing
✅ LTP: openposix test suite
✅ CIFS Connectathon
✅ POSIX pjd-fstest suites
✅ NFS Connectathon
✅ Loopdev Sanity
✅ jvm - jcstress tests
✅ Memory: fork_mem
✅ Memory function: memfd_create
✅ AMTU (Abstract Machine Test Utility)
✅ Networking bridge: sanity
✅ Ethernet drivers sanity
✅ Networking socket: fuzz
✅ Networking route: pmtu
✅ Networking route_func - local
✅ Networking route_func - forward
✅ Networking TCP: keepalive test
✅ Networking UDP: socket
✅ Networking cki netfilter test
✅ Networking tunnel: geneve basic test
✅ Networking tunnel: gre basic
✅ L2TP basic test
✅ Networking tunnel: vxlan basic
✅ Networking ipsec: basic netns - transport
✅ Networking ipsec: basic netns - tunnel
✅ Libkcapi AF_ALG test
✅ pciutils: update pci ids test
✅ ALSA PCM loopback test
✅ ALSA Control (mixer) Userspace Element test
✅ storage: dm/common
✅ lvm snapper test
✅ storage: SCSI VPD
✅ trace: ftrace/tracer
🚧 ✅ xarray-idr-radixtree-test
🚧 ✅ i2c: i2cdetect sanity
🚧 ✅ Firmware test suite
🚧 ✅ Memory function: kaslr
🚧 ✅ Networking: igmp conformance test
🚧 ✅ audit: audit testsuite test
Host 2:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ Reboot test
⚡⚡⚡ xfstests - ext4
⚡⚡⚡ xfstests - xfs
⚡⚡⚡ IPMI driver test
⚡⚡⚡ IPMItool loop stress test
⚡⚡⚡ selinux-policy: serge-testsuite
⚡⚡⚡ Storage blktests - blk
⚡⚡⚡ Storage block - filesystem fio test
⚡⚡⚡ Storage block - queue scheduler test
⚡⚡⚡ storage: software RAID testing
⚡⚡⚡ Storage: swraid mdadm raid_module test
⚡⚡⚡ stress: stress-ng - interrupt
⚡⚡⚡ stress: stress-ng - cpu
⚡⚡⚡ stress: stress-ng - cpu-cache
⚡⚡⚡ stress: stress-ng - memory
🚧 ⚡⚡⚡ Podman system test - as root
🚧 ⚡⚡⚡ Podman system test - as user
🚧 ⚡⚡⚡ xfstests - btrfs
🚧 ⚡⚡⚡ Storage blktests - nvme-tcp
🚧 ⚡⚡⚡ lvm cache test
🚧 ⚡⚡⚡ stress: stress-ng - os
Host 3:
✅ Boot test
✅ Reboot test
🚧 ✅ Storage blktests - srp
Host 4:
✅ Boot test
✅ Reboot test
🚧 ✅ Storage blktests - nvmeof-mp
Host 5:
✅ Boot test
✅ Reboot test
✅ Networking bridge: sanity - mlx5
✅ Ethernet drivers sanity - mlx5
Host 6:
✅ Boot test
✅ Reboot test
✅ xfstests - ext4
✅ xfstests - xfs
✅ IPMI driver test
✅ IPMItool loop stress test
✅ selinux-policy: serge-testsuite
✅ Storage blktests - blk
✅ Storage block - filesystem fio test
✅ Storage block - queue scheduler test
✅ storage: software RAID testing
✅ Storage: swraid mdadm raid_module test
✅ stress: stress-ng - interrupt
✅ stress: stress-ng - cpu
✅ stress: stress-ng - cpu-cache
✅ stress: stress-ng - memory
🚧 ✅ Podman system test - as root
🚧 ✅ Podman system test - as user
🚧 ✅ xfstests - btrfs
🚧 ✅ Storage blktests - nvme-tcp
🚧 ✅ lvm cache test
🚧 💥 stress: stress-ng - os
ppc64le:
Host 1:
✅ Boot test
✅ Reboot test
🚧 ❌ Storage blktests - srp
Host 2:
✅ Boot test
✅ Reboot test
✅ xfstests - ext4
✅ xfstests - xfs
✅ IPMI driver test
✅ IPMItool loop stress test
✅ selinux-policy: serge-testsuite
✅ Storage blktests - blk
✅ Storage block - filesystem fio test
✅ Storage block - queue scheduler test
✅ storage: software RAID testing
✅ Storage: swraid mdadm raid_module test
🚧 ✅ Podman system test - as root
🚧 ✅ Podman system test - as user
🚧 ✅ xfstests - btrfs
🚧 ✅ Storage blktests - nvme-tcp
🚧 ✅ Storage: lvm device-mapper test - upstream
🚧 ✅ lvm cache test
Host 3:
✅ Boot test
✅ Reboot test
🚧 ❌ Storage blktests - nvmeof-mp
Host 4:
✅ Boot test
✅ Reboot test
✅ LTP - cve
✅ LTP - sched
✅ LTP - syscalls
✅ LTP - can
✅ LTP - commands
✅ LTP - containers
✅ LTP - dio
✅ LTP - fs
✅ LTP - fsx
✅ LTP - math
✅ LTP - hugetlb
✅ LTP - mm
✅ LTP - nptl
✅ LTP - pty
✅ LTP - ipc
✅ LTP - tracing
✅ LTP: openposix test suite
✅ CIFS Connectathon
✅ POSIX pjd-fstest suites
✅ NFS Connectathon
✅ Loopdev Sanity
✅ jvm - jcstress tests
✅ Memory: fork_mem
✅ Memory function: memfd_create
✅ AMTU (Abstract Machine Test Utility)
✅ Networking bridge: sanity
✅ Ethernet drivers sanity
✅ Networking socket: fuzz
✅ Networking route: pmtu
✅ Networking route_func - local
✅ Networking route_func - forward
✅ Networking TCP: keepalive test
✅ Networking UDP: socket
✅ Networking cki netfilter test
✅ Networking tunnel: geneve basic test
✅ Networking tunnel: gre basic
✅ L2TP basic test
✅ Networking tunnel: vxlan basic
✅ Networking ipsec: basic netns - tunnel
✅ Libkcapi AF_ALG test
✅ pciutils: update pci ids test
✅ ALSA PCM loopback test
✅ ALSA Control (mixer) Userspace Element test
✅ storage: dm/common
✅ lvm snapper test
✅ trace: ftrace/tracer
🚧 ✅ xarray-idr-radixtree-test
🚧 ✅ Memory function: kaslr
🚧 ✅ audit: audit testsuite test
s390x:
Host 1:
✅ Boot test
✅ Reboot test
🚧 ✅ Storage blktests - nvmeof-mp
Host 2:
✅ Boot test
✅ Reboot test
🚧 ✅ Storage blktests - srp
Host 3:
✅ Boot test
✅ Reboot test
✅ LTP - cve
✅ LTP - sched
✅ LTP - syscalls
✅ LTP - can
✅ LTP - commands
✅ LTP - containers
✅ LTP - dio
✅ LTP - fs
✅ LTP - fsx
✅ LTP - math
✅ LTP - hugetlb
✅ LTP - mm
✅ LTP - nptl
✅ LTP - pty
✅ LTP - ipc
✅ LTP - tracing
✅ LTP: openposix test suite
✅ CIFS Connectathon
✅ POSIX pjd-fstest suites
✅ NFS Connectathon
✅ Loopdev Sanity
✅ jvm - jcstress tests
✅ Memory: fork_mem
✅ Memory function: memfd_create
✅ AMTU (Abstract Machine Test Utility)
✅ Networking bridge: sanity
✅ Ethernet drivers sanity
✅ Networking route: pmtu
✅ Networking route_func - local
✅ Networking route_func - forward
✅ Networking TCP: keepalive test
✅ Networking UDP: socket
✅ Networking cki netfilter test
✅ Networking tunnel: geneve basic test
✅ Networking tunnel: gre basic
✅ L2TP basic test
✅ Networking tunnel: vxlan basic
✅ Networking ipsec: basic netns - transport
✅ Networking ipsec: basic netns - tunnel
✅ Libkcapi AF_ALG test
✅ storage: dm/common
✅ lvm snapper test
✅ trace: ftrace/tracer
🚧 ❌ xarray-idr-radixtree-test
🚧 ✅ Memory function: kaslr
🚧 ✅ audit: audit testsuite test
Host 4:
✅ Boot test
✅ Reboot test
✅ selinux-policy: serge-testsuite
✅ Storage blktests - blk
✅ Storage: swraid mdadm raid_module test
✅ stress: stress-ng - interrupt
✅ stress: stress-ng - cpu
✅ stress: stress-ng - cpu-cache
✅ stress: stress-ng - memory
🚧 ✅ Podman system test - as root
🚧 ✅ Podman system test - as user
🚧 ✅ Storage blktests - nvme-tcp
🚧 ✅ lvm cache test
🚧 ✅ stress: stress-ng - os
x86_64:
Host 1:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ Reboot test
⚡⚡⚡ xfstests - ext4
⚡⚡⚡ xfstests - xfs
⚡⚡⚡ xfstests - nfsv4.2
⚡⚡⚡ xfstests - cifsv3.11
⚡⚡⚡ IPMI driver test
⚡⚡⚡ IPMItool loop stress test
⚡⚡⚡ selinux-policy: serge-testsuite
⚡⚡⚡ power-management: cpupower/sanity test
⚡⚡⚡ Storage blktests - blk
⚡⚡⚡ Storage block - filesystem fio test
⚡⚡⚡ Storage block - queue scheduler test
⚡⚡⚡ storage: software RAID testing
⚡⚡⚡ Storage: swraid mdadm raid_module test
⚡⚡⚡ stress: stress-ng - interrupt
⚡⚡⚡ stress: stress-ng - cpu
⚡⚡⚡ stress: stress-ng - cpu-cache
⚡⚡⚡ stress: stress-ng - memory
🚧 ⚡⚡⚡ Podman system test - as root
🚧 ⚡⚡⚡ Podman system test - as user
🚧 ⚡⚡⚡ CPU: Idle Test
🚧 ⚡⚡⚡ xfstests - btrfs
🚧 ⚡⚡⚡ Storage blktests - nvme-tcp
🚧 ⚡⚡⚡ Storage: lvm device-mapper test - upstream
🚧 ⚡⚡⚡ lvm cache test
🚧 ⚡⚡⚡ stress: stress-ng - os
Host 2:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ Reboot test
🚧 ⚡⚡⚡ Storage blktests - nvmeof-mp
Host 3:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
✅ Boot test
✅ Reboot test
✅ ACPI table test
⚡⚡⚡ LTP - cve
⚡⚡⚡ LTP - sched
⚡⚡⚡ LTP - syscalls
⚡⚡⚡ LTP - can
⚡⚡⚡ LTP - commands
⚡⚡⚡ LTP - containers
⚡⚡⚡ LTP - dio
⚡⚡⚡ LTP - fs
⚡⚡⚡ LTP - fsx
⚡⚡⚡ LTP - math
⚡⚡⚡ LTP - hugetlb
⚡⚡⚡ LTP - mm
⚡⚡⚡ LTP - nptl
⚡⚡⚡ LTP - pty
⚡⚡⚡ LTP - ipc
⚡⚡⚡ LTP - tracing
⚡⚡⚡ LTP: openposix test suite
⚡⚡⚡ CIFS Connectathon
⚡⚡⚡ POSIX pjd-fstest suites
⚡⚡⚡ NFS Connectathon
⚡⚡⚡ Loopdev Sanity
⚡⚡⚡ jvm - jcstress tests
⚡⚡⚡ Memory: fork_mem
⚡⚡⚡ Memory function: memfd_create
⚡⚡⚡ AMTU (Abstract Machine Test Utility)
⚡⚡⚡ Networking bridge: sanity
⚡⚡⚡ Ethernet drivers sanity
⚡⚡⚡ Networking socket: fuzz
⚡⚡⚡ Networking route: pmtu
⚡⚡⚡ Networking route_func - local
⚡⚡⚡ Networking route_func - forward
⚡⚡⚡ Networking TCP: keepalive test
⚡⚡⚡ Networking UDP: socket
⚡⚡⚡ Networking cki netfilter test
⚡⚡⚡ Networking tunnel: geneve basic test
⚡⚡⚡ Networking tunnel: gre basic
⚡⚡⚡ L2TP basic test
⚡⚡⚡ Networking tunnel: vxlan basic
⚡⚡⚡ Networking ipsec: basic netns - transport
⚡⚡⚡ Networking ipsec: basic netns - tunnel
⚡⚡⚡ Libkcapi AF_ALG test
⚡⚡⚡ pciutils: sanity smoke test
⚡⚡⚡ pciutils: update pci ids test
⚡⚡⚡ ALSA PCM loopback test
⚡⚡⚡ ALSA Control (mixer) Userspace Element test
⚡⚡⚡ storage: dm/common
⚡⚡⚡ lvm snapper test
⚡⚡⚡ storage: SCSI VPD
⚡⚡⚡ trace: ftrace/tracer
🚧 ⚡⚡⚡ xarray-idr-radixtree-test
🚧 ⚡⚡⚡ i2c: i2cdetect sanity
🚧 ⚡⚡⚡ Firmware test suite
🚧 ⚡⚡⚡ Memory function: kaslr
🚧 ⚡⚡⚡ Networking: igmp conformance test
🚧 ⚡⚡⚡ audit: audit testsuite test
Host 4:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ Reboot test
🚧 ⚡⚡⚡ Storage blktests - srp
Host 5:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ Reboot test
🚧 ⚡⚡⚡ Storage blktests - nvmeof-mp
Host 6:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
✅ Boot test
✅ Reboot test
🚧 ⚡⚡⚡ Storage blktests - srp
Test sources: https://gitlab.com/cki-project/kernel-tests
💚 Pull requests are welcome for new tests or improvements to existing tests!
Aborted tests
-------------
Tests that didn't complete running successfully are marked with ⚡⚡⚡.
If this was caused by an infrastructure issue, we try to mark that
explicitly in the report.
Waived tests
------------
If the test run included waived tests, they are marked with 🚧. Such tests are
executed but their results are not taken into account. Tests are waived when
their results are not reliable enough, e.g. when they're just introduced or are
being fixed.
Testing timeout
---------------
We aim to provide a report within reasonable timeframe. Tests that haven't
finished running yet are marked with ⏱.
Targeted tests
--------------
Test runs for patches always include a set of base tests, plus some
tests chosen based on the file paths modified by the patch. The latter
are called "targeted tests". If no targeted tests are run, that means
no patch-specific tests are available. Please, consider contributing a
targeted test for related patches to increase test coverage. See
https://docs.engineering.redhat.com/x/_wEZB for more details.
A commit introduced formal regstration of all Fabric nodes to the SCSI
transport as well as REG/UNREG RPI mailbox requests. The commit
introduced the NLP_RELEASE_RPI flag for rports set in the
lpfc_cmpl_els_logo_acc() routine to help clean up the RPIs. This new
code caused the driver to release the RPI value used for the remote port
and marked the RPI invalid. When the driver later attempted to re-login,
it would use the invalid RPI and the adapter rejected the PLOGI request.
As no login occurred, the devloss timer on the rport expired and
connectivity was lost.
This patch corrects the code by removing the snippet that requests the
rpi to be unregistered. This change only occurs on a node that is already
marked to be rediscovered. This puts the code back to its original
behavior, preserving the already-assigned rpi value (registered or not)
which can be used on the re-login attempts.
Fixes: fe83e3b9b422 ("scsi: lpfc: Fix node handling for Fabric Controller and Domain Controller")
Cc: <stable(a)vger.kernel.org> # v5.14+
Co-developed-by: Paul Ely <paul.ely(a)broadcom.com>
Signed-off-by: Paul Ely <paul.ely(a)broadcom.com>
Signed-off-by: James Smart <jsmart2021(a)gmail.com>
---
drivers/scsi/lpfc/lpfc_els.c | 9 ++-------
1 file changed, 2 insertions(+), 7 deletions(-)
diff --git a/drivers/scsi/lpfc/lpfc_els.c b/drivers/scsi/lpfc/lpfc_els.c
index b940e0268f96..e83453bea2ae 100644
--- a/drivers/scsi/lpfc/lpfc_els.c
+++ b/drivers/scsi/lpfc/lpfc_els.c
@@ -5095,14 +5095,9 @@ lpfc_cmpl_els_logo_acc(struct lpfc_hba *phba, struct lpfc_iocbq *cmdiocb,
/* NPort Recovery mode or node is just allocated */
if (!lpfc_nlp_not_used(ndlp)) {
/* A LOGO is completing and the node is in NPR state.
- * If this a fabric node that cleared its transport
- * registration, release the rpi.
+ * Just unregister the RPI because the node is still
+ * required.
*/
- spin_lock_irq(&ndlp->lock);
- ndlp->nlp_flag &= ~NLP_NPR_2B_DISC;
- if (phba->sli_rev == LPFC_SLI_REV4)
- ndlp->nlp_flag |= NLP_RELEASE_RPI;
- spin_unlock_irq(&ndlp->lock);
lpfc_unreg_rpi(vport, ndlp);
} else {
/* Indicate the node has already released, should
--
2.26.2
When recursively clearing out disconnected pts, the range based TLB
flush in handle_removed_tdp_mmu_page uses the wrong starting GFN,
resulting in the flush mostly missing the affected range. Fix this by
using base_gfn for the flush.
In response to feedback from David Matlack on the RFC version of this
patch, also move a few definitions into the for loop in the function to
prevent unintended references to them in the future.
Fixes: a066e61f13cf ("KVM: x86/mmu: Factor out handling of removed page tables")
CC: stable(a)vger.kernel.org
Signed-off-by: Ben Gardon <bgardon(a)google.com>
---
arch/x86/kvm/mmu/tdp_mmu.c | 10 ++++------
1 file changed, 4 insertions(+), 6 deletions(-)
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 7c5dd83e52de..4bd541050d21 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -317,9 +317,6 @@ static void handle_removed_tdp_mmu_page(struct kvm *kvm, tdp_ptep_t pt,
struct kvm_mmu_page *sp = sptep_to_sp(rcu_dereference(pt));
int level = sp->role.level;
gfn_t base_gfn = sp->gfn;
- u64 old_child_spte;
- u64 *sptep;
- gfn_t gfn;
int i;
trace_kvm_mmu_prepare_zap_page(sp);
@@ -327,8 +324,9 @@ static void handle_removed_tdp_mmu_page(struct kvm *kvm, tdp_ptep_t pt,
tdp_mmu_unlink_page(kvm, sp, shared);
for (i = 0; i < PT64_ENT_PER_PAGE; i++) {
- sptep = rcu_dereference(pt) + i;
- gfn = base_gfn + i * KVM_PAGES_PER_HPAGE(level);
+ u64 *sptep = rcu_dereference(pt) + i;
+ gfn_t gfn = base_gfn + i * KVM_PAGES_PER_HPAGE(level);
+ u64 old_child_spte;
if (shared) {
/*
@@ -374,7 +372,7 @@ static void handle_removed_tdp_mmu_page(struct kvm *kvm, tdp_ptep_t pt,
shared);
}
- kvm_flush_remote_tlbs_with_address(kvm, gfn,
+ kvm_flush_remote_tlbs_with_address(kvm, base_gfn,
KVM_PAGES_PER_HPAGE(level + 1));
call_rcu(&sp->rcu_head, tdp_mmu_free_sp_rcu_callback);
--
2.34.0.rc1.387.gb447b232ab-goog
Some devices tend to trigger SYS_ERR interrupt while the host handling
SYS_ERR state of the device during power up. This creates a race
condition and causes a failure in booting up the device.
The issue is seen on the Sierra Wireless EM9191 modem during SYS_ERR
handling in mhi_async_power_up(). Once the host detects that the device
is in SYS_ERR state, it issues MHI_RESET and waits for the device to
process the reset request. During this time, the device triggers SYS_ERR
interrupt to the host and host starts handling SYS_ERR execution.
So by the time the device has completed reset, host starts SYS_ERR
handling. This causes the race condition and the modem fails to boot.
Hence, register the IRQ handler only after handling the SYS_ERR check
to avoid getting spurious IRQs from the device.
Cc: stable(a)vger.kernel.org
Fixes: e18d4e9fa79b ("bus: mhi: core: Handle syserr during power_up")
Reported-by: Aleksander Morgado <aleksander(a)aleksander.es>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam(a)linaro.org>
---
Changes in v4:
* Reverted the change that moved BHI_INTVEC as that was causing issue as
reported by Aleksander.
Changes in v3:
* Moved BHI_INTVEC setup after irq setup
* Used interval_us as the delay for the polling API
Changes in v2:
* Switched to "mhi_poll_reg_field" for detecting MHI reset in device.
drivers/bus/mhi/core/pm.c | 33 +++++++++++----------------------
1 file changed, 11 insertions(+), 22 deletions(-)
diff --git a/drivers/bus/mhi/core/pm.c b/drivers/bus/mhi/core/pm.c
index fb99e3727155..21484a61bbed 100644
--- a/drivers/bus/mhi/core/pm.c
+++ b/drivers/bus/mhi/core/pm.c
@@ -1038,7 +1038,7 @@ int mhi_async_power_up(struct mhi_controller *mhi_cntrl)
enum mhi_ee_type current_ee;
enum dev_st_transition next_state;
struct device *dev = &mhi_cntrl->mhi_dev->dev;
- u32 val;
+ u32 interval_us = 25000; /* poll register field every 25 milliseconds */
int ret;
dev_info(dev, "Requested to power ON\n");
@@ -1055,10 +1055,6 @@ int mhi_async_power_up(struct mhi_controller *mhi_cntrl)
mutex_lock(&mhi_cntrl->pm_mutex);
mhi_cntrl->pm_state = MHI_PM_DISABLE;
- ret = mhi_init_irq_setup(mhi_cntrl);
- if (ret)
- goto error_setup_irq;
-
/* Setup BHI INTVEC */
write_lock_irq(&mhi_cntrl->pm_lock);
mhi_write_reg(mhi_cntrl, mhi_cntrl->bhi, BHI_INTVEC, 0);
@@ -1072,7 +1068,7 @@ int mhi_async_power_up(struct mhi_controller *mhi_cntrl)
dev_err(dev, "%s is not a valid EE for power on\n",
TO_MHI_EXEC_STR(current_ee));
ret = -EIO;
- goto error_async_power_up;
+ goto error_setup_irq;
}
state = mhi_get_mhi_state(mhi_cntrl);
@@ -1081,20 +1077,12 @@ int mhi_async_power_up(struct mhi_controller *mhi_cntrl)
if (state == MHI_STATE_SYS_ERR) {
mhi_set_mhi_state(mhi_cntrl, MHI_STATE_RESET);
- ret = wait_event_timeout(mhi_cntrl->state_event,
- MHI_PM_IN_FATAL_STATE(mhi_cntrl->pm_state) ||
- mhi_read_reg_field(mhi_cntrl,
- mhi_cntrl->regs,
- MHICTRL,
- MHICTRL_RESET_MASK,
- MHICTRL_RESET_SHIFT,
- &val) ||
- !val,
- msecs_to_jiffies(mhi_cntrl->timeout_ms));
- if (!ret) {
- ret = -EIO;
+ ret = mhi_poll_reg_field(mhi_cntrl, mhi_cntrl->regs, MHICTRL,
+ MHICTRL_RESET_MASK, MHICTRL_RESET_SHIFT, 0,
+ interval_us);
+ if (ret) {
dev_info(dev, "Failed to reset MHI due to syserr state\n");
- goto error_async_power_up;
+ goto error_setup_irq;
}
/*
@@ -1104,6 +1092,10 @@ int mhi_async_power_up(struct mhi_controller *mhi_cntrl)
mhi_write_reg(mhi_cntrl, mhi_cntrl->bhi, BHI_INTVEC, 0);
}
+ ret = mhi_init_irq_setup(mhi_cntrl);
+ if (ret)
+ goto error_setup_irq;
+
/* Transition to next state */
next_state = MHI_IN_PBL(current_ee) ?
DEV_ST_TRANSITION_PBL : DEV_ST_TRANSITION_READY;
@@ -1116,9 +1108,6 @@ int mhi_async_power_up(struct mhi_controller *mhi_cntrl)
return 0;
-error_async_power_up:
- mhi_deinit_free_irq(mhi_cntrl);
-
error_setup_irq:
mhi_cntrl->pm_state = MHI_PM_DISABLE;
mutex_unlock(&mhi_cntrl->pm_mutex);
--
2.25.1
The base address of uartlite registers could be 64 bit address which is from
device resource. When ulite_probe() calls ulite_assign(), this 64 bit
address is casted to 32-bit. The fix is to replace "u32" type with
"phys_addr_t" type for the base address in ulite_assign() argument list.
Fixes: 8fa7b6100693 ("[POWERPC] Uartlite: Separate the bus binding from the driver proper")
Signed-off-by: Lizhi Hou <lizhi.hou(a)xilinx.com>
---
drivers/tty/serial/uartlite.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/tty/serial/uartlite.c b/drivers/tty/serial/uartlite.c
index d3d9566e5dbd..e1fa52d31474 100644
--- a/drivers/tty/serial/uartlite.c
+++ b/drivers/tty/serial/uartlite.c
@@ -626,7 +626,7 @@ static struct uart_driver ulite_uart_driver = {
*
* Returns: 0 on success, <0 otherwise
*/
-static int ulite_assign(struct device *dev, int id, u32 base, int irq,
+static int ulite_assign(struct device *dev, int id, phys_addr_t base, int irq,
struct uartlite_data *pdata)
{
struct uart_port *port;
--
2.27.0
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 126e8bee943e9926238c891e2df5b5573aee76bc Mon Sep 17 00:00:00 2001
From: Alexander Mikhalitsyn <alexander.mikhalitsyn(a)virtuozzo.com>
Date: Fri, 19 Nov 2021 16:43:18 -0800
Subject: [PATCH] ipc: WARN if trying to remove ipc object which is absent
Patch series "shm: shm_rmid_forced feature fixes".
Some time ago I met kernel crash after CRIU restore procedure,
fortunately, it was CRIU restore, so, I had dump files and could do
restore many times and crash reproduced easily. After some
investigation I've constructed the minimal reproducer. It was found
that it's use-after-free and it happens only if sysctl
kernel.shm_rmid_forced = 1.
The key of the problem is that the exit_shm() function not handles shp's
object destroy when task->sysvshm.shm_clist contains items from
different IPC namespaces. In most cases this list will contain only
items from one IPC namespace.
How can this list contain object from different namespaces? The
exit_shm() function is designed to clean up this list always when
process leaves IPC namespace. But we made a mistake a long time ago and
did not add a exit_shm() call into the setns() syscall procedures.
The first idea was just to add this call to setns() syscall but it
obviously changes semantics of setns() syscall and that's
userspace-visible change. So, I gave up on this idea.
The first real attempt to address the issue was just to omit forced
destroy if we meet shp object not from current task IPC namespace [1].
But that was not the best idea because task->sysvshm.shm_clist was
protected by rwsem which belongs to current task IPC namespace. It
means that list corruption may occur.
Second approach is just extend exit_shm() to properly handle shp's from
different IPC namespaces [2]. This is really non-trivial thing, I've
put a lot of effort into that but not believed that it's possible to
make it fully safe, clean and clear.
Thanks to the efforts of Manfred Spraul working an elegant solution was
designed. Thanks a lot, Manfred!
Eric also suggested the way to address the issue in ("[RFC][PATCH] shm:
In shm_exit destroy all created and never attached segments") Eric's
idea was to maintain a list of shm_clists one per IPC namespace, use
lock-less lists. But there is some extra memory consumption-related
concerns.
An alternative solution which was suggested by me was implemented in
("shm: reset shm_clist on setns but omit forced shm destroy"). The idea
is pretty simple, we add exit_shm() syscall to setns() but DO NOT
destroy shm segments even if sysctl kernel.shm_rmid_forced = 1, we just
clean up the task->sysvshm.shm_clist list.
This chages semantics of setns() syscall a little bit but in comparision
to the "naive" solution when we just add exit_shm() without any special
exclusions this looks like a safer option.
[1] https://lkml.org/lkml/2021/7/6/1108
[2] https://lkml.org/lkml/2021/7/14/736
This patch (of 2):
Let's produce a warning if we trying to remove non-existing IPC object
from IPC namespace kht/idr structures.
This allows us to catch possible bugs when the ipc_rmid() function was
called with inconsistent struct ipc_ids*, struct kern_ipc_perm*
arguments.
Link: https://lkml.kernel.org/r/20211027224348.611025-1-alexander.mikhalitsyn@vir…
Link: https://lkml.kernel.org/r/20211027224348.611025-2-alexander.mikhalitsyn@vir…
Co-developed-by: Manfred Spraul <manfred(a)colorfullife.com>
Signed-off-by: Manfred Spraul <manfred(a)colorfullife.com>
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn(a)virtuozzo.com>
Cc: "Eric W. Biederman" <ebiederm(a)xmission.com>
Cc: Davidlohr Bueso <dave(a)stgolabs.net>
Cc: Greg KH <gregkh(a)linuxfoundation.org>
Cc: Andrei Vagin <avagin(a)gmail.com>
Cc: Pavel Tikhomirov <ptikhomirov(a)virtuozzo.com>
Cc: Vasily Averin <vvs(a)virtuozzo.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
diff --git a/ipc/util.c b/ipc/util.c
index d48d8cfa1f3f..fa2d86ef3fb8 100644
--- a/ipc/util.c
+++ b/ipc/util.c
@@ -447,8 +447,8 @@ static int ipcget_public(struct ipc_namespace *ns, struct ipc_ids *ids,
static void ipc_kht_remove(struct ipc_ids *ids, struct kern_ipc_perm *ipcp)
{
if (ipcp->key != IPC_PRIVATE)
- rhashtable_remove_fast(&ids->key_ht, &ipcp->khtnode,
- ipc_kht_params);
+ WARN_ON_ONCE(rhashtable_remove_fast(&ids->key_ht, &ipcp->khtnode,
+ ipc_kht_params));
}
/**
@@ -498,7 +498,7 @@ void ipc_rmid(struct ipc_ids *ids, struct kern_ipc_perm *ipcp)
{
int idx = ipcid_to_idx(ipcp->id);
- idr_remove(&ids->ipcs_idr, idx);
+ WARN_ON_ONCE(idr_remove(&ids->ipcs_idr, idx) != ipcp);
ipc_kht_remove(ids, ipcp);
ids->in_use--;
ipcp->deleted = true;
Hi Len,
I have a report (https://bugs.launchpad.net/bugs/1952094) that commit
f980d055a0f858d73d9467bb0b570721bbfcdfb8 ("CIFS: Fix a potencially
linear read overflow") causes a regression as a stable backport in a 5.4
based kernel. I don't know if this regression exists in tip as well, or
if it is unique to the backported environment. I suspect, given the
content of the patch, that it is generic. As such, it has been
backported to a number of stable releases:
linux-4.4.y.txt:0955df2d9bf4857e3e2287e3028903e6cec06c30
linux-4.9.y.txt:8878af780747f498551b7d360cae61b415798f18
linux-4.14.y.txt:20967547ffc6039f17c63a1c24eb779ee166b245
linux-4.19.y.txt:bea655491daf39f1934a71bf576bf3499092d3a4
linux-5.4.y.txt:b444064a0e0ef64491b8739a9ae05a952b5f8974
linux-5.10.y.txt:6c4857203ffa36918136756a889b12c5864bc4ad
linux-5.13.y.txt:9bffe470e9b537075345406512df01ca2188b725
linux-5.14.y.txt:c41dd61c86482ab34f6f039b13296308018fd99b
Could this be an off-by-one issue if the source string is full length ?
rtg
--
-----------
Tim Gardner
Canonical, Inc
Hi Greg,
attached are git bundles for some patches you merged into the 5.10
stable kernel already this morning.
Naming should be obvious, the patches are on the branch "back" in
each bundle.
Juergen
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 126e8bee943e9926238c891e2df5b5573aee76bc Mon Sep 17 00:00:00 2001
From: Alexander Mikhalitsyn <alexander.mikhalitsyn(a)virtuozzo.com>
Date: Fri, 19 Nov 2021 16:43:18 -0800
Subject: [PATCH] ipc: WARN if trying to remove ipc object which is absent
Patch series "shm: shm_rmid_forced feature fixes".
Some time ago I met kernel crash after CRIU restore procedure,
fortunately, it was CRIU restore, so, I had dump files and could do
restore many times and crash reproduced easily. After some
investigation I've constructed the minimal reproducer. It was found
that it's use-after-free and it happens only if sysctl
kernel.shm_rmid_forced = 1.
The key of the problem is that the exit_shm() function not handles shp's
object destroy when task->sysvshm.shm_clist contains items from
different IPC namespaces. In most cases this list will contain only
items from one IPC namespace.
How can this list contain object from different namespaces? The
exit_shm() function is designed to clean up this list always when
process leaves IPC namespace. But we made a mistake a long time ago and
did not add a exit_shm() call into the setns() syscall procedures.
The first idea was just to add this call to setns() syscall but it
obviously changes semantics of setns() syscall and that's
userspace-visible change. So, I gave up on this idea.
The first real attempt to address the issue was just to omit forced
destroy if we meet shp object not from current task IPC namespace [1].
But that was not the best idea because task->sysvshm.shm_clist was
protected by rwsem which belongs to current task IPC namespace. It
means that list corruption may occur.
Second approach is just extend exit_shm() to properly handle shp's from
different IPC namespaces [2]. This is really non-trivial thing, I've
put a lot of effort into that but not believed that it's possible to
make it fully safe, clean and clear.
Thanks to the efforts of Manfred Spraul working an elegant solution was
designed. Thanks a lot, Manfred!
Eric also suggested the way to address the issue in ("[RFC][PATCH] shm:
In shm_exit destroy all created and never attached segments") Eric's
idea was to maintain a list of shm_clists one per IPC namespace, use
lock-less lists. But there is some extra memory consumption-related
concerns.
An alternative solution which was suggested by me was implemented in
("shm: reset shm_clist on setns but omit forced shm destroy"). The idea
is pretty simple, we add exit_shm() syscall to setns() but DO NOT
destroy shm segments even if sysctl kernel.shm_rmid_forced = 1, we just
clean up the task->sysvshm.shm_clist list.
This chages semantics of setns() syscall a little bit but in comparision
to the "naive" solution when we just add exit_shm() without any special
exclusions this looks like a safer option.
[1] https://lkml.org/lkml/2021/7/6/1108
[2] https://lkml.org/lkml/2021/7/14/736
This patch (of 2):
Let's produce a warning if we trying to remove non-existing IPC object
from IPC namespace kht/idr structures.
This allows us to catch possible bugs when the ipc_rmid() function was
called with inconsistent struct ipc_ids*, struct kern_ipc_perm*
arguments.
Link: https://lkml.kernel.org/r/20211027224348.611025-1-alexander.mikhalitsyn@vir…
Link: https://lkml.kernel.org/r/20211027224348.611025-2-alexander.mikhalitsyn@vir…
Co-developed-by: Manfred Spraul <manfred(a)colorfullife.com>
Signed-off-by: Manfred Spraul <manfred(a)colorfullife.com>
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn(a)virtuozzo.com>
Cc: "Eric W. Biederman" <ebiederm(a)xmission.com>
Cc: Davidlohr Bueso <dave(a)stgolabs.net>
Cc: Greg KH <gregkh(a)linuxfoundation.org>
Cc: Andrei Vagin <avagin(a)gmail.com>
Cc: Pavel Tikhomirov <ptikhomirov(a)virtuozzo.com>
Cc: Vasily Averin <vvs(a)virtuozzo.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
diff --git a/ipc/util.c b/ipc/util.c
index d48d8cfa1f3f..fa2d86ef3fb8 100644
--- a/ipc/util.c
+++ b/ipc/util.c
@@ -447,8 +447,8 @@ static int ipcget_public(struct ipc_namespace *ns, struct ipc_ids *ids,
static void ipc_kht_remove(struct ipc_ids *ids, struct kern_ipc_perm *ipcp)
{
if (ipcp->key != IPC_PRIVATE)
- rhashtable_remove_fast(&ids->key_ht, &ipcp->khtnode,
- ipc_kht_params);
+ WARN_ON_ONCE(rhashtable_remove_fast(&ids->key_ht, &ipcp->khtnode,
+ ipc_kht_params));
}
/**
@@ -498,7 +498,7 @@ void ipc_rmid(struct ipc_ids *ids, struct kern_ipc_perm *ipcp)
{
int idx = ipcid_to_idx(ipcp->id);
- idr_remove(&ids->ipcs_idr, idx);
+ WARN_ON_ONCE(idr_remove(&ids->ipcs_idr, idx) != ipcp);
ipc_kht_remove(ids, ipcp);
ids->in_use--;
ipcp->deleted = true;
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 473441720c8616dfaf4451f9c7ea14f0eb5e5d65 Mon Sep 17 00:00:00 2001
From: Miklos Szeredi <mszeredi(a)redhat.com>
Date: Thu, 25 Nov 2021 14:05:18 +0100
Subject: [PATCH] fuse: release pipe buf after last use
Checking buf->flags should be done before the pipe_buf_release() is called
on the pipe buffer, since releasing the buffer might modify the flags.
This is exactly what page_cache_pipe_buf_release() does, and which results
in the same VM_BUG_ON_PAGE(PageLRU(page)) that the original patch was
trying to fix.
Reported-by: Justin Forbes <jmforbes(a)linuxtx.org>
Fixes: 712a951025c0 ("fuse: fix page stealing")
Cc: <stable(a)vger.kernel.org> # v2.6.35
Signed-off-by: Miklos Szeredi <mszeredi(a)redhat.com>
diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 79f7eda49e06..cd54a529460d 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -847,17 +847,17 @@ static int fuse_try_move_page(struct fuse_copy_state *cs, struct page **pagep)
replace_page_cache_page(oldpage, newpage);
+ get_page(newpage);
+
+ if (!(buf->flags & PIPE_BUF_FLAG_LRU))
+ lru_cache_add(newpage);
+
/*
* Release while we have extra ref on stolen page. Otherwise
* anon_pipe_buf_release() might think the page can be reused.
*/
pipe_buf_release(cs->pipe, buf);
- get_page(newpage);
-
- if (!(buf->flags & PIPE_BUF_FLAG_LRU))
- lru_cache_add(newpage);
-
err = 0;
spin_lock(&cs->req->waitq.lock);
if (test_bit(FR_ABORTED, &cs->req->flags))
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 473441720c8616dfaf4451f9c7ea14f0eb5e5d65 Mon Sep 17 00:00:00 2001
From: Miklos Szeredi <mszeredi(a)redhat.com>
Date: Thu, 25 Nov 2021 14:05:18 +0100
Subject: [PATCH] fuse: release pipe buf after last use
Checking buf->flags should be done before the pipe_buf_release() is called
on the pipe buffer, since releasing the buffer might modify the flags.
This is exactly what page_cache_pipe_buf_release() does, and which results
in the same VM_BUG_ON_PAGE(PageLRU(page)) that the original patch was
trying to fix.
Reported-by: Justin Forbes <jmforbes(a)linuxtx.org>
Fixes: 712a951025c0 ("fuse: fix page stealing")
Cc: <stable(a)vger.kernel.org> # v2.6.35
Signed-off-by: Miklos Szeredi <mszeredi(a)redhat.com>
diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 79f7eda49e06..cd54a529460d 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -847,17 +847,17 @@ static int fuse_try_move_page(struct fuse_copy_state *cs, struct page **pagep)
replace_page_cache_page(oldpage, newpage);
+ get_page(newpage);
+
+ if (!(buf->flags & PIPE_BUF_FLAG_LRU))
+ lru_cache_add(newpage);
+
/*
* Release while we have extra ref on stolen page. Otherwise
* anon_pipe_buf_release() might think the page can be reused.
*/
pipe_buf_release(cs->pipe, buf);
- get_page(newpage);
-
- if (!(buf->flags & PIPE_BUF_FLAG_LRU))
- lru_cache_add(newpage);
-
err = 0;
spin_lock(&cs->req->waitq.lock);
if (test_bit(FR_ABORTED, &cs->req->flags))
Hi Greg and Antonio,
The stable-rc 5.4 build is failing.
( 5.10 and 5.15 builds pass )
because new build getting these two extra configs.
CONFIG_GENERIC_COMPAT_VDSO=y
CONFIG_COMPAT_VDSO=y
These two configs are getting added by extra build variable
CROSS_COMPILE_COMPAT=arm-linux-gnueabihf-
This extra variable is coming from new tuxmake tool.
Doesn't build 32 bit vDSO for arm64
https://gitlab.com/Linaro/tuxmake/-/issues/160
ref:
https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-5.4.y/
Reported-by: Linux Kernel Functional Testing <lkft(a)linaro.org>
--
Linaro LKFT
https://lkft.linaro.org
From: Arnd Bergmann <arnd(a)arndb.de>
On ARM v6 and later, we define CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
because the ordinary load/store instructions (ldr, ldrh, ldrb) can
tolerate any misalignment of the memory address. However, load/store
double and load/store multiple instructions (ldrd, ldm) may still only
be used on memory addresses that are 32-bit aligned, and so we have to
use the CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS macro with care, or we
may end up with a severe performance hit due to alignment traps that
require fixups by the kernel. Testing shows that this currently happens
with clang-13 but not gcc-11. In theory, any compiler version can
produce this bug or other problems, as we are dealing with undefined
behavior in C99 even on architectures that support this in hardware,
see also https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100363.
Fortunately, the get_unaligned() accessors do the right thing: when
building for ARMv6 or later, the compiler will emit unaligned accesses
using the ordinary load/store instructions (but avoid the ones that
require 32-bit alignment). When building for older ARM, those accessors
will emit the appropriate sequence of ldrb/mov/orr instructions. And on
architectures that can truly tolerate any kind of misalignment, the
get_unaligned() accessors resolve to the leXX_to_cpup accessors that
operate on aligned addresses.
Since the compiler will in fact emit ldrd or ldm instructions when
building this code for ARM v6 or later, the solution is to use the
unaligned accessors unconditionally on architectures where this is
known to be fast. The _aligned version of the hash function is
however still needed to get the best performance on architectures
that cannot do any unaligned access in hardware.
This new version avoids the undefined behavior and should produce
the fastest hash on all architectures we support.
Link: https://lore.kernel.org/linux-arm-kernel/20181008211554.5355-4-ard.biesheuv…
Link: https://lore.kernel.org/linux-crypto/CAK8P3a2KfmmGDbVHULWevB0hv71P2oi2ZCHEA…
Reported-by: Ard Biesheuvel <ard.biesheuvel(a)linaro.org>
Fixes: 2c956a60778c ("siphash: add cryptographically secure PRF")
Cc: stable(a)vger.kernel.org
Signed-off-by: Arnd Bergmann <arnd(a)arndb.de>
Reviewed-by: Jason A. Donenfeld <Jason(a)zx2c4.com>
Acked-by: Ard Biesheuvel <ardb(a)kernel.org>
Signed-off-by: Jason A. Donenfeld <Jason(a)zx2c4.com>
---
include/linux/siphash.h | 14 ++++----------
lib/siphash.c | 12 ++++++------
2 files changed, 10 insertions(+), 16 deletions(-)
diff --git a/include/linux/siphash.h b/include/linux/siphash.h
index bf21591a9e5e..0cda61855d90 100644
--- a/include/linux/siphash.h
+++ b/include/linux/siphash.h
@@ -27,9 +27,7 @@ static inline bool siphash_key_is_zero(const siphash_key_t *key)
}
u64 __siphash_aligned(const void *data, size_t len, const siphash_key_t *key);
-#ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
u64 __siphash_unaligned(const void *data, size_t len, const siphash_key_t *key);
-#endif
u64 siphash_1u64(const u64 a, const siphash_key_t *key);
u64 siphash_2u64(const u64 a, const u64 b, const siphash_key_t *key);
@@ -82,10 +80,9 @@ static inline u64 ___siphash_aligned(const __le64 *data, size_t len,
static inline u64 siphash(const void *data, size_t len,
const siphash_key_t *key)
{
-#ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
- if (!IS_ALIGNED((unsigned long)data, SIPHASH_ALIGNMENT))
+ if (IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) ||
+ !IS_ALIGNED((unsigned long)data, SIPHASH_ALIGNMENT))
return __siphash_unaligned(data, len, key);
-#endif
return ___siphash_aligned(data, len, key);
}
@@ -96,10 +93,8 @@ typedef struct {
u32 __hsiphash_aligned(const void *data, size_t len,
const hsiphash_key_t *key);
-#ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
u32 __hsiphash_unaligned(const void *data, size_t len,
const hsiphash_key_t *key);
-#endif
u32 hsiphash_1u32(const u32 a, const hsiphash_key_t *key);
u32 hsiphash_2u32(const u32 a, const u32 b, const hsiphash_key_t *key);
@@ -135,10 +130,9 @@ static inline u32 ___hsiphash_aligned(const __le32 *data, size_t len,
static inline u32 hsiphash(const void *data, size_t len,
const hsiphash_key_t *key)
{
-#ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
- if (!IS_ALIGNED((unsigned long)data, HSIPHASH_ALIGNMENT))
+ if (IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) ||
+ !IS_ALIGNED((unsigned long)data, HSIPHASH_ALIGNMENT))
return __hsiphash_unaligned(data, len, key);
-#endif
return ___hsiphash_aligned(data, len, key);
}
diff --git a/lib/siphash.c b/lib/siphash.c
index a90112ee72a1..72b9068ab57b 100644
--- a/lib/siphash.c
+++ b/lib/siphash.c
@@ -49,6 +49,7 @@
SIPROUND; \
return (v0 ^ v1) ^ (v2 ^ v3);
+#ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
u64 __siphash_aligned(const void *data, size_t len, const siphash_key_t *key)
{
const u8 *end = data + len - (len % sizeof(u64));
@@ -80,8 +81,8 @@ u64 __siphash_aligned(const void *data, size_t len, const siphash_key_t *key)
POSTAMBLE
}
EXPORT_SYMBOL(__siphash_aligned);
+#endif
-#ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
u64 __siphash_unaligned(const void *data, size_t len, const siphash_key_t *key)
{
const u8 *end = data + len - (len % sizeof(u64));
@@ -113,7 +114,6 @@ u64 __siphash_unaligned(const void *data, size_t len, const siphash_key_t *key)
POSTAMBLE
}
EXPORT_SYMBOL(__siphash_unaligned);
-#endif
/**
* siphash_1u64 - compute 64-bit siphash PRF value of a u64
@@ -250,6 +250,7 @@ EXPORT_SYMBOL(siphash_3u32);
HSIPROUND; \
return (v0 ^ v1) ^ (v2 ^ v3);
+#ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
u32 __hsiphash_aligned(const void *data, size_t len, const hsiphash_key_t *key)
{
const u8 *end = data + len - (len % sizeof(u64));
@@ -280,8 +281,8 @@ u32 __hsiphash_aligned(const void *data, size_t len, const hsiphash_key_t *key)
HPOSTAMBLE
}
EXPORT_SYMBOL(__hsiphash_aligned);
+#endif
-#ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
u32 __hsiphash_unaligned(const void *data, size_t len,
const hsiphash_key_t *key)
{
@@ -313,7 +314,6 @@ u32 __hsiphash_unaligned(const void *data, size_t len,
HPOSTAMBLE
}
EXPORT_SYMBOL(__hsiphash_unaligned);
-#endif
/**
* hsiphash_1u32 - compute 64-bit hsiphash PRF value of a u32
@@ -418,6 +418,7 @@ EXPORT_SYMBOL(hsiphash_4u32);
HSIPROUND; \
return v1 ^ v3;
+#ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
u32 __hsiphash_aligned(const void *data, size_t len, const hsiphash_key_t *key)
{
const u8 *end = data + len - (len % sizeof(u32));
@@ -438,8 +439,8 @@ u32 __hsiphash_aligned(const void *data, size_t len, const hsiphash_key_t *key)
HPOSTAMBLE
}
EXPORT_SYMBOL(__hsiphash_aligned);
+#endif
-#ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
u32 __hsiphash_unaligned(const void *data, size_t len,
const hsiphash_key_t *key)
{
@@ -461,7 +462,6 @@ u32 __hsiphash_unaligned(const void *data, size_t len,
HPOSTAMBLE
}
EXPORT_SYMBOL(__hsiphash_unaligned);
-#endif
/**
* hsiphash_1u32 - compute 32-bit hsiphash PRF value of a u32
--
2.34.1
From: "Gustavo A. R. Silva" <gustavoars(a)kernel.org>
Use 2-factor argument form kvcalloc() instead of kvzalloc().
Link: https://github.com/KSPP/linux/issues/162
Cc: stable(a)vger.kernel.org
Fixes: e7096c131e51 ("net: WireGuard secure network tunnel")
Signed-off-by: Gustavo A. R. Silva <gustavoars(a)kernel.org>
[Jason: Gustavo's link above is for KSPP, but this isn't actually a
security fix, as table_size is bounded to 8192 anyway, and gcc realizes
this, so the codegen comes out to be about the same.]
Signed-off-by: Jason A. Donenfeld <Jason(a)zx2c4.com>
---
drivers/net/wireguard/ratelimiter.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/wireguard/ratelimiter.c b/drivers/net/wireguard/ratelimiter.c
index 3fedd1d21f5e..dd55e5c26f46 100644
--- a/drivers/net/wireguard/ratelimiter.c
+++ b/drivers/net/wireguard/ratelimiter.c
@@ -176,12 +176,12 @@ int wg_ratelimiter_init(void)
(1U << 14) / sizeof(struct hlist_head)));
max_entries = table_size * 8;
- table_v4 = kvzalloc(table_size * sizeof(*table_v4), GFP_KERNEL);
+ table_v4 = kvcalloc(table_size, sizeof(*table_v4), GFP_KERNEL);
if (unlikely(!table_v4))
goto err_kmemcache;
#if IS_ENABLED(CONFIG_IPV6)
- table_v6 = kvzalloc(table_size * sizeof(*table_v6), GFP_KERNEL);
+ table_v6 = kvcalloc(table_size, sizeof(*table_v6), GFP_KERNEL);
if (unlikely(!table_v6)) {
kvfree(table_v4);
goto err_kmemcache;
--
2.34.1
If we're being delivered packets from multiple CPUs so quickly that the
ring lock is contended for CPU tries, then it's safe to assume that the
queue is near capacity anyway, so just drop the packet rather than
spinning. This helps deal with multicore DoS that can interfere with
data path performance. It _still_ does not completely fix the issue, but
it again chips away at it.
Reported-by: Streun Fabio <fstreun(a)student.ethz.ch>
Cc: stable(a)vger.kernel.org
Fixes: e7096c131e51 ("net: WireGuard secure network tunnel")
Signed-off-by: Jason A. Donenfeld <Jason(a)zx2c4.com>
---
drivers/net/wireguard/receive.c | 16 +++++++++++++---
1 file changed, 13 insertions(+), 3 deletions(-)
diff --git a/drivers/net/wireguard/receive.c b/drivers/net/wireguard/receive.c
index f4e537e3e8ec..7b8df406c773 100644
--- a/drivers/net/wireguard/receive.c
+++ b/drivers/net/wireguard/receive.c
@@ -554,9 +554,19 @@ void wg_packet_receive(struct wg_device *wg, struct sk_buff *skb)
case cpu_to_le32(MESSAGE_HANDSHAKE_INITIATION):
case cpu_to_le32(MESSAGE_HANDSHAKE_RESPONSE):
case cpu_to_le32(MESSAGE_HANDSHAKE_COOKIE): {
- int cpu;
- if (unlikely(!rng_is_initialized() ||
- ptr_ring_produce_bh(&wg->handshake_queue.ring, skb))) {
+ int cpu, ret = -EBUSY;
+
+ if (unlikely(!rng_is_initialized()))
+ goto drop;
+ if (atomic_read(&wg->handshake_queue_len) > MAX_QUEUED_INCOMING_HANDSHAKES / 2) {
+ if (spin_trylock_bh(&wg->handshake_queue.ring.producer_lock)) {
+ ret = __ptr_ring_produce(&wg->handshake_queue.ring, skb);
+ spin_unlock_bh(&wg->handshake_queue.ring.producer_lock);
+ }
+ } else
+ ret = ptr_ring_produce_bh(&wg->handshake_queue.ring, skb);
+ if (ret) {
+ drop:
net_dbg_skb_ratelimited("%s: Dropping handshake packet from %pISpfsc\n",
wg->dev->name, skb);
goto err;
--
2.34.1
The selftests currently parse the kernel log at the end to track
potential memory leaks. With these tests now reading off the end of the
buffer, due to recent optimizations, some creation messages were lost,
making the tests think that there was a free without an alloc. Fix this
by increasing the kernel log size.
Fixes: 24b70eeeb4f4 ("wireguard: use synchronize_net rather than synchronize_rcu")
Cc: stable(a)vger.kernel.org
Signed-off-by: Jason A. Donenfeld <Jason(a)zx2c4.com>
---
tools/testing/selftests/wireguard/qemu/kernel.config | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/wireguard/qemu/kernel.config b/tools/testing/selftests/wireguard/qemu/kernel.config
index 74db83a0aedd..a9b5a520a1d2 100644
--- a/tools/testing/selftests/wireguard/qemu/kernel.config
+++ b/tools/testing/selftests/wireguard/qemu/kernel.config
@@ -66,6 +66,7 @@ CONFIG_PROC_SYSCTL=y
CONFIG_SYSFS=y
CONFIG_TMPFS=y
CONFIG_CONSOLE_LOGLEVEL_DEFAULT=15
+CONFIG_LOG_BUF_SHIFT=18
CONFIG_PRINTK_TIME=y
CONFIG_BLK_DEV_INITRD=y
CONFIG_LEGACY_VSYSCALL_NONE=y
--
2.34.1
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 0cc318d2e8408bc0ffb4662a0c3e5e57005ac6ff Mon Sep 17 00:00:00 2001
From: Jedrzej Jagielski <jedrzej.jagielski(a)intel.com>
Date: Tue, 7 Sep 2021 09:25:40 +0000
Subject: [PATCH] iavf: Fix deadlock occurrence during resetting VF interface
System hangs if close the interface is called from the kernel during
the interface is in resetting state.
During resetting operation the link is closing but kernel didn't
know it and it tried to close this interface again what sometimes
led to deadlock.
Inform kernel about current state of interface
and turn off the flag IFF_UP when interface is closing until reset
is finished.
Previously it was most likely to hang the system when kernel
(network manager) tried to close the interface in the same time
when interface was in resetting state because of deadlock.
Fixes: 3c8e0b989aa1 ("i40vf: don't stop me now")
Signed-off-by: Jaroslaw Gawin <jaroslawx.gawin(a)intel.com>
Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski(a)intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski(a)intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen(a)intel.com>
diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c
index 336e6bf95e48..84680777ac12 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_main.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_main.c
@@ -2254,6 +2254,7 @@ static void iavf_reset_task(struct work_struct *work)
(adapter->state == __IAVF_RESETTING));
if (running) {
+ netdev->flags &= ~IFF_UP;
netif_carrier_off(netdev);
netif_tx_stop_all_queues(netdev);
adapter->link_up = false;
@@ -2365,7 +2366,7 @@ static void iavf_reset_task(struct work_struct *work)
* to __IAVF_RUNNING
*/
iavf_up_complete(adapter);
-
+ netdev->flags |= IFF_UP;
iavf_irq_enable(adapter, true);
} else {
iavf_change_state(adapter, __IAVF_DOWN);
@@ -2378,8 +2379,10 @@ static void iavf_reset_task(struct work_struct *work)
reset_err:
mutex_unlock(&adapter->client_lock);
mutex_unlock(&adapter->crit_lock);
- if (running)
+ if (running) {
iavf_change_state(adapter, __IAVF_RUNNING);
+ netdev->flags |= IFF_UP;
+ }
dev_err(&adapter->pdev->dev, "failed to allocate resources during reinit\n");
iavf_close(netdev);
}
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 0cc318d2e8408bc0ffb4662a0c3e5e57005ac6ff Mon Sep 17 00:00:00 2001
From: Jedrzej Jagielski <jedrzej.jagielski(a)intel.com>
Date: Tue, 7 Sep 2021 09:25:40 +0000
Subject: [PATCH] iavf: Fix deadlock occurrence during resetting VF interface
System hangs if close the interface is called from the kernel during
the interface is in resetting state.
During resetting operation the link is closing but kernel didn't
know it and it tried to close this interface again what sometimes
led to deadlock.
Inform kernel about current state of interface
and turn off the flag IFF_UP when interface is closing until reset
is finished.
Previously it was most likely to hang the system when kernel
(network manager) tried to close the interface in the same time
when interface was in resetting state because of deadlock.
Fixes: 3c8e0b989aa1 ("i40vf: don't stop me now")
Signed-off-by: Jaroslaw Gawin <jaroslawx.gawin(a)intel.com>
Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski(a)intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski(a)intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen(a)intel.com>
diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c
index 336e6bf95e48..84680777ac12 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_main.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_main.c
@@ -2254,6 +2254,7 @@ static void iavf_reset_task(struct work_struct *work)
(adapter->state == __IAVF_RESETTING));
if (running) {
+ netdev->flags &= ~IFF_UP;
netif_carrier_off(netdev);
netif_tx_stop_all_queues(netdev);
adapter->link_up = false;
@@ -2365,7 +2366,7 @@ static void iavf_reset_task(struct work_struct *work)
* to __IAVF_RUNNING
*/
iavf_up_complete(adapter);
-
+ netdev->flags |= IFF_UP;
iavf_irq_enable(adapter, true);
} else {
iavf_change_state(adapter, __IAVF_DOWN);
@@ -2378,8 +2379,10 @@ static void iavf_reset_task(struct work_struct *work)
reset_err:
mutex_unlock(&adapter->client_lock);
mutex_unlock(&adapter->crit_lock);
- if (running)
+ if (running) {
iavf_change_state(adapter, __IAVF_RUNNING);
+ netdev->flags |= IFF_UP;
+ }
dev_err(&adapter->pdev->dev, "failed to allocate resources during reinit\n");
iavf_close(netdev);
}
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 0cc318d2e8408bc0ffb4662a0c3e5e57005ac6ff Mon Sep 17 00:00:00 2001
From: Jedrzej Jagielski <jedrzej.jagielski(a)intel.com>
Date: Tue, 7 Sep 2021 09:25:40 +0000
Subject: [PATCH] iavf: Fix deadlock occurrence during resetting VF interface
System hangs if close the interface is called from the kernel during
the interface is in resetting state.
During resetting operation the link is closing but kernel didn't
know it and it tried to close this interface again what sometimes
led to deadlock.
Inform kernel about current state of interface
and turn off the flag IFF_UP when interface is closing until reset
is finished.
Previously it was most likely to hang the system when kernel
(network manager) tried to close the interface in the same time
when interface was in resetting state because of deadlock.
Fixes: 3c8e0b989aa1 ("i40vf: don't stop me now")
Signed-off-by: Jaroslaw Gawin <jaroslawx.gawin(a)intel.com>
Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski(a)intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski(a)intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen(a)intel.com>
diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c
index 336e6bf95e48..84680777ac12 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_main.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_main.c
@@ -2254,6 +2254,7 @@ static void iavf_reset_task(struct work_struct *work)
(adapter->state == __IAVF_RESETTING));
if (running) {
+ netdev->flags &= ~IFF_UP;
netif_carrier_off(netdev);
netif_tx_stop_all_queues(netdev);
adapter->link_up = false;
@@ -2365,7 +2366,7 @@ static void iavf_reset_task(struct work_struct *work)
* to __IAVF_RUNNING
*/
iavf_up_complete(adapter);
-
+ netdev->flags |= IFF_UP;
iavf_irq_enable(adapter, true);
} else {
iavf_change_state(adapter, __IAVF_DOWN);
@@ -2378,8 +2379,10 @@ static void iavf_reset_task(struct work_struct *work)
reset_err:
mutex_unlock(&adapter->client_lock);
mutex_unlock(&adapter->crit_lock);
- if (running)
+ if (running) {
iavf_change_state(adapter, __IAVF_RUNNING);
+ netdev->flags |= IFF_UP;
+ }
dev_err(&adapter->pdev->dev, "failed to allocate resources during reinit\n");
iavf_close(netdev);
}