This is a note to let you know that I've just added the patch titled
coresight: trbe: Defer the probe on offline CPUs
to my char-misc git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
in the char-misc-next branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will also be merged in the next major kernel release
during the merge window.
If you have any questions about this process, please let me know.
>From a08025b3fe56185290a1ea476581f03ca733f967 Mon Sep 17 00:00:00 2001
From: Suzuki K Poulose <suzuki.poulose(a)arm.com>
Date: Thu, 14 Oct 2021 15:22:38 +0100
Subject: coresight: trbe: Defer the probe on offline CPUs
If a CPU is offline during the driver init, we could end up causing
a kernel crash trying to register the coresight device for the TRBE
instance. The trbe_cpudata for the TRBE instance is initialized only
when it is probed. Otherwise, we could end up dereferencing a NULL
cpudata->drvdata.
e.g:
[ 0.149999] coresight ete0: CPU0: ete v1.1 initialized
[ 0.149999] coresight-etm4x ete_1: ETM arch init failed
[ 0.149999] coresight-etm4x: probe of ete_1 failed with error -22
[ 0.150085] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000050
[ 0.150085] Mem abort info:
[ 0.150085] ESR = 0x96000005
[ 0.150085] EC = 0x25: DABT (current EL), IL = 32 bits
[ 0.150085] SET = 0, FnV = 0
[ 0.150085] EA = 0, S1PTW = 0
[ 0.150085] Data abort info:
[ 0.150085] ISV = 0, ISS = 0x00000005
[ 0.150085] CM = 0, WnR = 0
[ 0.150085] [0000000000000050] user address but active_mm is swapper
[ 0.150085] Internal error: Oops: 96000005 [#1] PREEMPT SMP
[ 0.150085] Modules linked in:
[ 0.150085] Hardware name: FVP Base RevC (DT)
[ 0.150085] pstate: 00800009 (nzcv daif -PAN +UAO -TCO BTYPE=--)
[ 0.150155] pc : arm_trbe_register_coresight_cpu+0x74/0x144
[ 0.150155] lr : arm_trbe_register_coresight_cpu+0x48/0x144
...
[ 0.150237] Call trace:
[ 0.150237] arm_trbe_register_coresight_cpu+0x74/0x144
[ 0.150237] arm_trbe_device_probe+0x1c0/0x2d8
[ 0.150259] platform_drv_probe+0x94/0xbc
[ 0.150259] really_probe+0x1bc/0x4a8
[ 0.150266] driver_probe_device+0x7c/0xb8
[ 0.150266] device_driver_attach+0x6c/0xac
[ 0.150266] __driver_attach+0xc4/0x148
[ 0.150266] bus_for_each_dev+0x7c/0xc8
[ 0.150266] driver_attach+0x24/0x30
[ 0.150266] bus_add_driver+0x100/0x1e0
[ 0.150266] driver_register+0x78/0x110
[ 0.150266] __platform_driver_register+0x44/0x50
[ 0.150266] arm_trbe_init+0x28/0x84
[ 0.150266] do_one_initcall+0x94/0x2bc
[ 0.150266] do_initcall_level+0xa4/0x158
[ 0.150266] do_initcalls+0x54/0x94
[ 0.150319] do_basic_setup+0x24/0x30
[ 0.150319] kernel_init_freeable+0xe8/0x14c
[ 0.150319] kernel_init+0x14/0x18c
[ 0.150319] ret_from_fork+0x10/0x30
[ 0.150319] Code: f94012c8 b0004ce2 9134a442 52819801 (f9402917)
[ 0.150319] ---[ end trace d23e0cfe5098535e ]---
[ 0.150346] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
Fix this by skipping the step, if we are unable to probe the CPU.
Fixes: 3fbf7f011f24 ("coresight: sink: Add TRBE driver")
Reported-by: Bransilav Rankov <branislav.rankov(a)arm.com>
Cc: Anshuman Khandual <anshuman.khandual(a)arm.com>
Cc: Mathieu Poirier <mathieu.poirier(a)linaro.org>
Cc: Mike Leach <mike.leach(a)linaro.org>
Cc: Leo Yan <leo.yan(a)linaro.org>
Cc: stable <stable(a)vger.kernel.org>
Tested-by: Branislav Rankov <branislav.rankov(a)arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose(a)arm.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual(a)arm.com>
Link: https://lore.kernel.org/r/20211014142238.2221248-1-suzuki.poulose@arm.com
Signed-off-by: Mathieu Poirier <mathieu.poirier(a)linaro.org>
---
drivers/hwtracing/coresight/coresight-trbe.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c
index 2825ccb0cf39..5d77baba8b0f 100644
--- a/drivers/hwtracing/coresight/coresight-trbe.c
+++ b/drivers/hwtracing/coresight/coresight-trbe.c
@@ -893,6 +893,10 @@ static void arm_trbe_register_coresight_cpu(struct trbe_drvdata *drvdata, int cp
if (WARN_ON(trbe_csdev))
return;
+ /* If the TRBE was not probed on the CPU, we shouldn't be here */
+ if (WARN_ON(!cpudata->drvdata))
+ return;
+
dev = &cpudata->drvdata->pdev->dev;
desc.name = devm_kasprintf(dev, GFP_KERNEL, "trbe%d", cpu);
if (!desc.name)
@@ -974,7 +978,9 @@ static int arm_trbe_probe_coresight(struct trbe_drvdata *drvdata)
return -ENOMEM;
for_each_cpu(cpu, &drvdata->supported_cpus) {
- smp_call_function_single(cpu, arm_trbe_probe_cpu, drvdata, 1);
+ /* If we fail to probe the CPU, let us defer it to hotplug callbacks */
+ if (smp_call_function_single(cpu, arm_trbe_probe_cpu, drvdata, 1))
+ continue;
if (cpumask_test_cpu(cpu, &drvdata->supported_cpus))
arm_trbe_register_coresight_cpu(drvdata, cpu);
if (cpumask_test_cpu(cpu, &drvdata->supported_cpus))
--
2.33.1
This is a note to let you know that I've just added the patch titled
coresight: trbe: Fix incorrect access of the sink specific data
to my char-misc git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
in the char-misc-next branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will also be merged in the next major kernel release
during the merge window.
If you have any questions about this process, please let me know.
>From bb5293e334af51b19b62d8bef1852ea13e935e9b Mon Sep 17 00:00:00 2001
From: Suzuki K Poulose <suzuki.poulose(a)arm.com>
Date: Tue, 21 Sep 2021 14:41:05 +0100
Subject: coresight: trbe: Fix incorrect access of the sink specific data
The TRBE driver wrongly treats the aux private data as the TRBE driver
specific buffer for a given perf handle, while it is the ETM PMU's
event specific data. Fix this by correcting the instance to use
appropriate helper.
Cc: stable <stable(a)vger.kernel.org>
Fixes: 3fbf7f011f24 ("coresight: sink: Add TRBE driver")
Signed-off-by: Suzuki K Poulose <suzuki.poulose(a)arm.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual(a)arm.com>
Link: https://lore.kernel.org/r/20210921134121.2423546-2-suzuki.poulose@arm.com
[Fixed 13 character SHA down to 12]
Signed-off-by: Mathieu Poirier <mathieu.poirier(a)linaro.org>
---
drivers/hwtracing/coresight/coresight-trbe.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c
index a53ee98f312f..2825ccb0cf39 100644
--- a/drivers/hwtracing/coresight/coresight-trbe.c
+++ b/drivers/hwtracing/coresight/coresight-trbe.c
@@ -382,7 +382,7 @@ static unsigned long __trbe_normal_offset(struct perf_output_handle *handle)
static unsigned long trbe_normal_offset(struct perf_output_handle *handle)
{
- struct trbe_buf *buf = perf_get_aux(handle);
+ struct trbe_buf *buf = etm_perf_sink_config(handle);
u64 limit = __trbe_normal_offset(handle);
u64 head = PERF_IDX2OFF(handle->head, buf);
--
2.33.1
This is a note to let you know that I've just added the patch titled
coresight: cti: Correct the parameter for pm_runtime_put
to my char-misc git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
in the char-misc-next branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will also be merged in the next major kernel release
during the merge window.
If you have any questions about this process, please let me know.
>From 692c9a499b286ea478f41b23a91fe3873b9e1326 Mon Sep 17 00:00:00 2001
From: Tao Zhang <quic_taozha(a)quicinc.com>
Date: Thu, 19 Aug 2021 17:29:37 +0800
Subject: coresight: cti: Correct the parameter for pm_runtime_put
The input parameter of the function pm_runtime_put should be the
same in the function cti_enable_hw and cti_disable_hw. The correct
parameter to use here should be dev->parent.
Signed-off-by: Tao Zhang <quic_taozha(a)quicinc.com>
Reviewed-by: Leo Yan <leo.yan(a)linaro.org>
Fixes: 835d722ba10a ("coresight: cti: Initial CoreSight CTI Driver")
Cc: stable <stable(a)vger.kernel.org>
Link: https://lore.kernel.org/r/1629365377-5937-1-git-send-email-quic_taozha@quic…
Signed-off-by: Mathieu Poirier <mathieu.poirier(a)linaro.org>
---
drivers/hwtracing/coresight/coresight-cti-core.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/hwtracing/coresight/coresight-cti-core.c b/drivers/hwtracing/coresight/coresight-cti-core.c
index e2a3620cbf48..8988b2ed2ea6 100644
--- a/drivers/hwtracing/coresight/coresight-cti-core.c
+++ b/drivers/hwtracing/coresight/coresight-cti-core.c
@@ -175,7 +175,7 @@ static int cti_disable_hw(struct cti_drvdata *drvdata)
coresight_disclaim_device_unlocked(csdev);
CS_LOCK(drvdata->base);
spin_unlock(&drvdata->spinlock);
- pm_runtime_put(dev);
+ pm_runtime_put(dev->parent);
return 0;
/* not disabled this call */
--
2.33.1
This is a note to let you know that I've just added the patch titled
usb: gadget: Mark USB_FSL_QE broken on 64-bit
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-next branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will also be merged in the next major kernel release
during the merge window.
If you have any questions about this process, please let me know.
>From a0548b26901f082684ad1fb3ba397d2de3a1406a Mon Sep 17 00:00:00 2001
From: Geert Uytterhoeven <geert(a)linux-m68k.org>
Date: Wed, 27 Oct 2021 10:08:49 +0200
Subject: usb: gadget: Mark USB_FSL_QE broken on 64-bit
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
On 64-bit:
drivers/usb/gadget/udc/fsl_qe_udc.c: In function ‘qe_ep0_rx’:
drivers/usb/gadget/udc/fsl_qe_udc.c:842:13: error: cast from pointer to integer of different size [-Werror=pointer-to-int-cast]
842 | vaddr = (u32)phys_to_virt(in_be32(&bd->buf));
| ^
In file included from drivers/usb/gadget/udc/fsl_qe_udc.c:41:
drivers/usb/gadget/udc/fsl_qe_udc.c:843:28: error: cast to pointer from integer of different size [-Werror=int-to-pointer-cast]
843 | frame_set_data(pframe, (u8 *)vaddr);
| ^
The driver assumes physical and virtual addresses are 32-bit, hence it
cannot work on 64-bit platforms.
Acked-by: Li Yang <leoyang.li(a)nxp.com>
Signed-off-by: Geert Uytterhoeven <geert(a)linux-m68k.org>
Link: https://lore.kernel.org/r/20211027080849.3276289-1-geert@linux-m68k.org
Cc: stable <stable(a)vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/gadget/udc/Kconfig | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/usb/gadget/udc/Kconfig b/drivers/usb/gadget/udc/Kconfig
index 8c614bb86c66..69394dc1cdfb 100644
--- a/drivers/usb/gadget/udc/Kconfig
+++ b/drivers/usb/gadget/udc/Kconfig
@@ -330,6 +330,7 @@ config USB_AMD5536UDC
config USB_FSL_QE
tristate "Freescale QE/CPM USB Device Controller"
depends on FSL_SOC && (QUICC_ENGINE || CPM)
+ depends on !64BIT || BROKEN
help
Some of Freescale PowerPC processors have a Full Speed
QE/CPM2 USB controller, which support device mode with 4
--
2.33.1
Reading from the CMOS involves writing to the index register and then
reading from the data register. Therefore access to the CMOS has to be
serialized with rtc_lock. This invocation of CMOS_READ was not
serialized, which could cause trouble when other code is accessing CMOS
at the same time.
Use spin_lock_irq() like the rest of the function.
Nothing in kernel modifies the RTC_DM_BINARY bit, so there could be a
separate pair of spin_lock_irq() / spin_unlock_irq() before doing the
math.
Signed-off-by: Mateusz Jończyk <mat.jonczyk(a)o2.pl>
Reviewed-by: Nobuhiro Iwamatsu <iwamatsu(a)nigauri.org>
Cc: Alessandro Zummo <a.zummo(a)towertech.it>
Cc: Alexandre Belloni <alexandre.belloni(a)bootlin.com>
Cc: stable(a)vger.kernel.org
---
drivers/rtc/rtc-cmos.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/rtc/rtc-cmos.c b/drivers/rtc/rtc-cmos.c
index 4eb53412b808..dc3f8b0dde98 100644
--- a/drivers/rtc/rtc-cmos.c
+++ b/drivers/rtc/rtc-cmos.c
@@ -457,7 +457,10 @@ static int cmos_set_alarm(struct device *dev, struct rtc_wkalrm *t)
min = t->time.tm_min;
sec = t->time.tm_sec;
+ spin_lock_irq(&rtc_lock);
rtc_control = CMOS_READ(RTC_CONTROL);
+ spin_unlock_irq(&rtc_lock);
+
if (!(rtc_control & RTC_DM_BINARY) || RTC_ALWAYS_BCD) {
/* Writing 0xff means "don't care" or "match all". */
mon = (mon <= 12) ? bin2bcd(mon) : 0xff;
--
2.25.1
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From af1a02aa23c37045e6adfcf074cf7dbac167a403 Mon Sep 17 00:00:00 2001
From: Andrew Lunn <andrew(a)lunn.ch>
Date: Sun, 24 Oct 2021 21:48:05 +0200
Subject: [PATCH] phy: phy_ethtool_ksettings_set: Lock the PHY while changing
settings
There is a race condition where the PHY state machine can change
members of the phydev structure at the same time userspace requests a
change via ethtool. To prevent this, have phy_ethtool_ksettings_set
take the PHY lock.
Fixes: 2d55173e71b0 ("phy: add generic function to support ksetting support")
Reported-by: Walter Stoll <Walter.Stoll(a)duagon.com>
Suggested-by: Walter Stoll <Walter.Stoll(a)duagon.com>
Tested-by: Walter Stoll <Walter.Stoll(a)duagon.com>
Signed-off-by: Andrew Lunn <andrew(a)lunn.ch>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index d845afab1af7..a3bfb156c83d 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -798,6 +798,7 @@ int phy_ethtool_ksettings_set(struct phy_device *phydev,
duplex != DUPLEX_FULL)))
return -EINVAL;
+ mutex_lock(&phydev->lock);
phydev->autoneg = autoneg;
if (autoneg == AUTONEG_DISABLE) {
@@ -814,8 +815,9 @@ int phy_ethtool_ksettings_set(struct phy_device *phydev,
phydev->mdix_ctrl = cmd->base.eth_tp_mdix_ctrl;
/* Restart the PHY */
- phy_start_aneg(phydev);
+ _phy_start_aneg(phydev);
+ mutex_unlock(&phydev->lock);
return 0;
}
EXPORT_SYMBOL(phy_ethtool_ksettings_set);
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From af1a02aa23c37045e6adfcf074cf7dbac167a403 Mon Sep 17 00:00:00 2001
From: Andrew Lunn <andrew(a)lunn.ch>
Date: Sun, 24 Oct 2021 21:48:05 +0200
Subject: [PATCH] phy: phy_ethtool_ksettings_set: Lock the PHY while changing
settings
There is a race condition where the PHY state machine can change
members of the phydev structure at the same time userspace requests a
change via ethtool. To prevent this, have phy_ethtool_ksettings_set
take the PHY lock.
Fixes: 2d55173e71b0 ("phy: add generic function to support ksetting support")
Reported-by: Walter Stoll <Walter.Stoll(a)duagon.com>
Suggested-by: Walter Stoll <Walter.Stoll(a)duagon.com>
Tested-by: Walter Stoll <Walter.Stoll(a)duagon.com>
Signed-off-by: Andrew Lunn <andrew(a)lunn.ch>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index d845afab1af7..a3bfb156c83d 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -798,6 +798,7 @@ int phy_ethtool_ksettings_set(struct phy_device *phydev,
duplex != DUPLEX_FULL)))
return -EINVAL;
+ mutex_lock(&phydev->lock);
phydev->autoneg = autoneg;
if (autoneg == AUTONEG_DISABLE) {
@@ -814,8 +815,9 @@ int phy_ethtool_ksettings_set(struct phy_device *phydev,
phydev->mdix_ctrl = cmd->base.eth_tp_mdix_ctrl;
/* Restart the PHY */
- phy_start_aneg(phydev);
+ _phy_start_aneg(phydev);
+ mutex_unlock(&phydev->lock);
return 0;
}
EXPORT_SYMBOL(phy_ethtool_ksettings_set);
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From af1a02aa23c37045e6adfcf074cf7dbac167a403 Mon Sep 17 00:00:00 2001
From: Andrew Lunn <andrew(a)lunn.ch>
Date: Sun, 24 Oct 2021 21:48:05 +0200
Subject: [PATCH] phy: phy_ethtool_ksettings_set: Lock the PHY while changing
settings
There is a race condition where the PHY state machine can change
members of the phydev structure at the same time userspace requests a
change via ethtool. To prevent this, have phy_ethtool_ksettings_set
take the PHY lock.
Fixes: 2d55173e71b0 ("phy: add generic function to support ksetting support")
Reported-by: Walter Stoll <Walter.Stoll(a)duagon.com>
Suggested-by: Walter Stoll <Walter.Stoll(a)duagon.com>
Tested-by: Walter Stoll <Walter.Stoll(a)duagon.com>
Signed-off-by: Andrew Lunn <andrew(a)lunn.ch>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index d845afab1af7..a3bfb156c83d 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -798,6 +798,7 @@ int phy_ethtool_ksettings_set(struct phy_device *phydev,
duplex != DUPLEX_FULL)))
return -EINVAL;
+ mutex_lock(&phydev->lock);
phydev->autoneg = autoneg;
if (autoneg == AUTONEG_DISABLE) {
@@ -814,8 +815,9 @@ int phy_ethtool_ksettings_set(struct phy_device *phydev,
phydev->mdix_ctrl = cmd->base.eth_tp_mdix_ctrl;
/* Restart the PHY */
- phy_start_aneg(phydev);
+ _phy_start_aneg(phydev);
+ mutex_unlock(&phydev->lock);
return 0;
}
EXPORT_SYMBOL(phy_ethtool_ksettings_set);
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From af1a02aa23c37045e6adfcf074cf7dbac167a403 Mon Sep 17 00:00:00 2001
From: Andrew Lunn <andrew(a)lunn.ch>
Date: Sun, 24 Oct 2021 21:48:05 +0200
Subject: [PATCH] phy: phy_ethtool_ksettings_set: Lock the PHY while changing
settings
There is a race condition where the PHY state machine can change
members of the phydev structure at the same time userspace requests a
change via ethtool. To prevent this, have phy_ethtool_ksettings_set
take the PHY lock.
Fixes: 2d55173e71b0 ("phy: add generic function to support ksetting support")
Reported-by: Walter Stoll <Walter.Stoll(a)duagon.com>
Suggested-by: Walter Stoll <Walter.Stoll(a)duagon.com>
Tested-by: Walter Stoll <Walter.Stoll(a)duagon.com>
Signed-off-by: Andrew Lunn <andrew(a)lunn.ch>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index d845afab1af7..a3bfb156c83d 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -798,6 +798,7 @@ int phy_ethtool_ksettings_set(struct phy_device *phydev,
duplex != DUPLEX_FULL)))
return -EINVAL;
+ mutex_lock(&phydev->lock);
phydev->autoneg = autoneg;
if (autoneg == AUTONEG_DISABLE) {
@@ -814,8 +815,9 @@ int phy_ethtool_ksettings_set(struct phy_device *phydev,
phydev->mdix_ctrl = cmd->base.eth_tp_mdix_ctrl;
/* Restart the PHY */
- phy_start_aneg(phydev);
+ _phy_start_aneg(phydev);
+ mutex_unlock(&phydev->lock);
return 0;
}
EXPORT_SYMBOL(phy_ethtool_ksettings_set);
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 707293a56f95f8e7e0cfae008010c7933fb68973 Mon Sep 17 00:00:00 2001
From: Andrew Lunn <andrew(a)lunn.ch>
Date: Sun, 24 Oct 2021 21:48:04 +0200
Subject: [PATCH] phy: phy_start_aneg: Add an unlocked version
Split phy_start_aneg into a wrapper which takes the PHY lock, and a
helper doing the real work. This will be needed when
phy_ethtook_ksettings_set takes the lock.
Fixes: 2d55173e71b0 ("phy: add generic function to support ksetting support")
Signed-off-by: Andrew Lunn <andrew(a)lunn.ch>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index ee584fa8b76d..d845afab1af7 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -700,7 +700,7 @@ static int phy_check_link_status(struct phy_device *phydev)
}
/**
- * phy_start_aneg - start auto-negotiation for this PHY device
+ * _phy_start_aneg - start auto-negotiation for this PHY device
* @phydev: the phy_device struct
*
* Description: Sanitizes the settings (if we're not autonegotiating
@@ -708,25 +708,43 @@ static int phy_check_link_status(struct phy_device *phydev)
* If the PHYCONTROL Layer is operating, we change the state to
* reflect the beginning of Auto-negotiation or forcing.
*/
-int phy_start_aneg(struct phy_device *phydev)
+static int _phy_start_aneg(struct phy_device *phydev)
{
int err;
+ lockdep_assert_held(&phydev->lock);
+
if (!phydev->drv)
return -EIO;
- mutex_lock(&phydev->lock);
-
if (AUTONEG_DISABLE == phydev->autoneg)
phy_sanitize_settings(phydev);
err = phy_config_aneg(phydev);
if (err < 0)
- goto out_unlock;
+ return err;
if (phy_is_started(phydev))
err = phy_check_link_status(phydev);
-out_unlock:
+
+ return err;
+}
+
+/**
+ * phy_start_aneg - start auto-negotiation for this PHY device
+ * @phydev: the phy_device struct
+ *
+ * Description: Sanitizes the settings (if we're not autonegotiating
+ * them), and then calls the driver's config_aneg function.
+ * If the PHYCONTROL Layer is operating, we change the state to
+ * reflect the beginning of Auto-negotiation or forcing.
+ */
+int phy_start_aneg(struct phy_device *phydev)
+{
+ int err;
+
+ mutex_lock(&phydev->lock);
+ err = _phy_start_aneg(phydev);
mutex_unlock(&phydev->lock);
return err;
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 707293a56f95f8e7e0cfae008010c7933fb68973 Mon Sep 17 00:00:00 2001
From: Andrew Lunn <andrew(a)lunn.ch>
Date: Sun, 24 Oct 2021 21:48:04 +0200
Subject: [PATCH] phy: phy_start_aneg: Add an unlocked version
Split phy_start_aneg into a wrapper which takes the PHY lock, and a
helper doing the real work. This will be needed when
phy_ethtook_ksettings_set takes the lock.
Fixes: 2d55173e71b0 ("phy: add generic function to support ksetting support")
Signed-off-by: Andrew Lunn <andrew(a)lunn.ch>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index ee584fa8b76d..d845afab1af7 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -700,7 +700,7 @@ static int phy_check_link_status(struct phy_device *phydev)
}
/**
- * phy_start_aneg - start auto-negotiation for this PHY device
+ * _phy_start_aneg - start auto-negotiation for this PHY device
* @phydev: the phy_device struct
*
* Description: Sanitizes the settings (if we're not autonegotiating
@@ -708,25 +708,43 @@ static int phy_check_link_status(struct phy_device *phydev)
* If the PHYCONTROL Layer is operating, we change the state to
* reflect the beginning of Auto-negotiation or forcing.
*/
-int phy_start_aneg(struct phy_device *phydev)
+static int _phy_start_aneg(struct phy_device *phydev)
{
int err;
+ lockdep_assert_held(&phydev->lock);
+
if (!phydev->drv)
return -EIO;
- mutex_lock(&phydev->lock);
-
if (AUTONEG_DISABLE == phydev->autoneg)
phy_sanitize_settings(phydev);
err = phy_config_aneg(phydev);
if (err < 0)
- goto out_unlock;
+ return err;
if (phy_is_started(phydev))
err = phy_check_link_status(phydev);
-out_unlock:
+
+ return err;
+}
+
+/**
+ * phy_start_aneg - start auto-negotiation for this PHY device
+ * @phydev: the phy_device struct
+ *
+ * Description: Sanitizes the settings (if we're not autonegotiating
+ * them), and then calls the driver's config_aneg function.
+ * If the PHYCONTROL Layer is operating, we change the state to
+ * reflect the beginning of Auto-negotiation or forcing.
+ */
+int phy_start_aneg(struct phy_device *phydev)
+{
+ int err;
+
+ mutex_lock(&phydev->lock);
+ err = _phy_start_aneg(phydev);
mutex_unlock(&phydev->lock);
return err;
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 707293a56f95f8e7e0cfae008010c7933fb68973 Mon Sep 17 00:00:00 2001
From: Andrew Lunn <andrew(a)lunn.ch>
Date: Sun, 24 Oct 2021 21:48:04 +0200
Subject: [PATCH] phy: phy_start_aneg: Add an unlocked version
Split phy_start_aneg into a wrapper which takes the PHY lock, and a
helper doing the real work. This will be needed when
phy_ethtook_ksettings_set takes the lock.
Fixes: 2d55173e71b0 ("phy: add generic function to support ksetting support")
Signed-off-by: Andrew Lunn <andrew(a)lunn.ch>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index ee584fa8b76d..d845afab1af7 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -700,7 +700,7 @@ static int phy_check_link_status(struct phy_device *phydev)
}
/**
- * phy_start_aneg - start auto-negotiation for this PHY device
+ * _phy_start_aneg - start auto-negotiation for this PHY device
* @phydev: the phy_device struct
*
* Description: Sanitizes the settings (if we're not autonegotiating
@@ -708,25 +708,43 @@ static int phy_check_link_status(struct phy_device *phydev)
* If the PHYCONTROL Layer is operating, we change the state to
* reflect the beginning of Auto-negotiation or forcing.
*/
-int phy_start_aneg(struct phy_device *phydev)
+static int _phy_start_aneg(struct phy_device *phydev)
{
int err;
+ lockdep_assert_held(&phydev->lock);
+
if (!phydev->drv)
return -EIO;
- mutex_lock(&phydev->lock);
-
if (AUTONEG_DISABLE == phydev->autoneg)
phy_sanitize_settings(phydev);
err = phy_config_aneg(phydev);
if (err < 0)
- goto out_unlock;
+ return err;
if (phy_is_started(phydev))
err = phy_check_link_status(phydev);
-out_unlock:
+
+ return err;
+}
+
+/**
+ * phy_start_aneg - start auto-negotiation for this PHY device
+ * @phydev: the phy_device struct
+ *
+ * Description: Sanitizes the settings (if we're not autonegotiating
+ * them), and then calls the driver's config_aneg function.
+ * If the PHYCONTROL Layer is operating, we change the state to
+ * reflect the beginning of Auto-negotiation or forcing.
+ */
+int phy_start_aneg(struct phy_device *phydev)
+{
+ int err;
+
+ mutex_lock(&phydev->lock);
+ err = _phy_start_aneg(phydev);
mutex_unlock(&phydev->lock);
return err;
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From c10a485c3de5ccbf1fff65a382cebcb2730c6b06 Mon Sep 17 00:00:00 2001
From: Andrew Lunn <andrew(a)lunn.ch>
Date: Sun, 24 Oct 2021 21:48:02 +0200
Subject: [PATCH] phy: phy_ethtool_ksettings_get: Lock the phy for consistency
The PHY structure should be locked while copying information out if
it, otherwise there is no guarantee of self consistency. Without the
lock the PHY state machine could be updating the structure.
Fixes: 2d55173e71b0 ("phy: add generic function to support ksetting support")
Signed-off-by: Andrew Lunn <andrew(a)lunn.ch>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index f124a8a58bd4..8457b829667e 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -299,6 +299,7 @@ EXPORT_SYMBOL(phy_ethtool_ksettings_set);
void phy_ethtool_ksettings_get(struct phy_device *phydev,
struct ethtool_link_ksettings *cmd)
{
+ mutex_lock(&phydev->lock);
linkmode_copy(cmd->link_modes.supported, phydev->supported);
linkmode_copy(cmd->link_modes.advertising, phydev->advertising);
linkmode_copy(cmd->link_modes.lp_advertising, phydev->lp_advertising);
@@ -317,6 +318,7 @@ void phy_ethtool_ksettings_get(struct phy_device *phydev,
cmd->base.autoneg = phydev->autoneg;
cmd->base.eth_tp_mdix_ctrl = phydev->mdix_ctrl;
cmd->base.eth_tp_mdix = phydev->mdix;
+ mutex_unlock(&phydev->lock);
}
EXPORT_SYMBOL(phy_ethtool_ksettings_get);
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From c10a485c3de5ccbf1fff65a382cebcb2730c6b06 Mon Sep 17 00:00:00 2001
From: Andrew Lunn <andrew(a)lunn.ch>
Date: Sun, 24 Oct 2021 21:48:02 +0200
Subject: [PATCH] phy: phy_ethtool_ksettings_get: Lock the phy for consistency
The PHY structure should be locked while copying information out if
it, otherwise there is no guarantee of self consistency. Without the
lock the PHY state machine could be updating the structure.
Fixes: 2d55173e71b0 ("phy: add generic function to support ksetting support")
Signed-off-by: Andrew Lunn <andrew(a)lunn.ch>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index f124a8a58bd4..8457b829667e 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -299,6 +299,7 @@ EXPORT_SYMBOL(phy_ethtool_ksettings_set);
void phy_ethtool_ksettings_get(struct phy_device *phydev,
struct ethtool_link_ksettings *cmd)
{
+ mutex_lock(&phydev->lock);
linkmode_copy(cmd->link_modes.supported, phydev->supported);
linkmode_copy(cmd->link_modes.advertising, phydev->advertising);
linkmode_copy(cmd->link_modes.lp_advertising, phydev->lp_advertising);
@@ -317,6 +318,7 @@ void phy_ethtool_ksettings_get(struct phy_device *phydev,
cmd->base.autoneg = phydev->autoneg;
cmd->base.eth_tp_mdix_ctrl = phydev->mdix_ctrl;
cmd->base.eth_tp_mdix = phydev->mdix;
+ mutex_unlock(&phydev->lock);
}
EXPORT_SYMBOL(phy_ethtool_ksettings_get);
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From c10a485c3de5ccbf1fff65a382cebcb2730c6b06 Mon Sep 17 00:00:00 2001
From: Andrew Lunn <andrew(a)lunn.ch>
Date: Sun, 24 Oct 2021 21:48:02 +0200
Subject: [PATCH] phy: phy_ethtool_ksettings_get: Lock the phy for consistency
The PHY structure should be locked while copying information out if
it, otherwise there is no guarantee of self consistency. Without the
lock the PHY state machine could be updating the structure.
Fixes: 2d55173e71b0 ("phy: add generic function to support ksetting support")
Signed-off-by: Andrew Lunn <andrew(a)lunn.ch>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index f124a8a58bd4..8457b829667e 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -299,6 +299,7 @@ EXPORT_SYMBOL(phy_ethtool_ksettings_set);
void phy_ethtool_ksettings_get(struct phy_device *phydev,
struct ethtool_link_ksettings *cmd)
{
+ mutex_lock(&phydev->lock);
linkmode_copy(cmd->link_modes.supported, phydev->supported);
linkmode_copy(cmd->link_modes.advertising, phydev->advertising);
linkmode_copy(cmd->link_modes.lp_advertising, phydev->lp_advertising);
@@ -317,6 +318,7 @@ void phy_ethtool_ksettings_get(struct phy_device *phydev,
cmd->base.autoneg = phydev->autoneg;
cmd->base.eth_tp_mdix_ctrl = phydev->mdix_ctrl;
cmd->base.eth_tp_mdix = phydev->mdix;
+ mutex_unlock(&phydev->lock);
}
EXPORT_SYMBOL(phy_ethtool_ksettings_get);
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From e8684db191e4164f3f5f3ad7dec04a6734c25f1c Mon Sep 17 00:00:00 2001
From: Yuiko Oshino <yuiko.oshino(a)microchip.com>
Date: Wed, 27 Oct 2021 14:23:02 -0400
Subject: [PATCH] net: ethernet: microchip: lan743x: Fix skb allocation failure
The driver allocates skb during ndo_open with GFP_ATOMIC which has high chance of failure when there are multiple instances.
GFP_KERNEL is enough while open and use GFP_ATOMIC only from interrupt context.
Fixes: 23f0703c125b ("lan743x: Add main source files for new lan743x driver")
Signed-off-by: Yuiko Oshino <yuiko.oshino(a)microchip.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
diff --git a/drivers/net/ethernet/microchip/lan743x_main.c b/drivers/net/ethernet/microchip/lan743x_main.c
index c4f32c7e0db1..4d5a5d6595b3 100644
--- a/drivers/net/ethernet/microchip/lan743x_main.c
+++ b/drivers/net/ethernet/microchip/lan743x_main.c
@@ -1944,7 +1944,8 @@ static void lan743x_rx_update_tail(struct lan743x_rx *rx, int index)
index);
}
-static int lan743x_rx_init_ring_element(struct lan743x_rx *rx, int index)
+static int lan743x_rx_init_ring_element(struct lan743x_rx *rx, int index,
+ gfp_t gfp)
{
struct net_device *netdev = rx->adapter->netdev;
struct device *dev = &rx->adapter->pdev->dev;
@@ -1958,7 +1959,7 @@ static int lan743x_rx_init_ring_element(struct lan743x_rx *rx, int index)
descriptor = &rx->ring_cpu_ptr[index];
buffer_info = &rx->buffer_info[index];
- skb = __netdev_alloc_skb(netdev, buffer_length, GFP_ATOMIC | GFP_DMA);
+ skb = __netdev_alloc_skb(netdev, buffer_length, gfp);
if (!skb)
return -ENOMEM;
dma_ptr = dma_map_single(dev, skb->data, buffer_length, DMA_FROM_DEVICE);
@@ -2120,7 +2121,8 @@ static int lan743x_rx_process_buffer(struct lan743x_rx *rx)
/* save existing skb, allocate new skb and map to dma */
skb = buffer_info->skb;
- if (lan743x_rx_init_ring_element(rx, rx->last_head)) {
+ if (lan743x_rx_init_ring_element(rx, rx->last_head,
+ GFP_ATOMIC | GFP_DMA)) {
/* failed to allocate next skb.
* Memory is very low.
* Drop this packet and reuse buffer.
@@ -2335,13 +2337,16 @@ static int lan743x_rx_ring_init(struct lan743x_rx *rx)
rx->last_head = 0;
for (index = 0; index < rx->ring_size; index++) {
- ret = lan743x_rx_init_ring_element(rx, index);
+ ret = lan743x_rx_init_ring_element(rx, index, GFP_KERNEL);
if (ret)
goto cleanup;
}
return 0;
cleanup:
+ netif_warn(rx->adapter, ifup, rx->adapter->netdev,
+ "Error allocating memory for LAN743x\n");
+
lan743x_rx_ring_cleanup(rx);
return ret;
}
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From e8684db191e4164f3f5f3ad7dec04a6734c25f1c Mon Sep 17 00:00:00 2001
From: Yuiko Oshino <yuiko.oshino(a)microchip.com>
Date: Wed, 27 Oct 2021 14:23:02 -0400
Subject: [PATCH] net: ethernet: microchip: lan743x: Fix skb allocation failure
The driver allocates skb during ndo_open with GFP_ATOMIC which has high chance of failure when there are multiple instances.
GFP_KERNEL is enough while open and use GFP_ATOMIC only from interrupt context.
Fixes: 23f0703c125b ("lan743x: Add main source files for new lan743x driver")
Signed-off-by: Yuiko Oshino <yuiko.oshino(a)microchip.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
diff --git a/drivers/net/ethernet/microchip/lan743x_main.c b/drivers/net/ethernet/microchip/lan743x_main.c
index c4f32c7e0db1..4d5a5d6595b3 100644
--- a/drivers/net/ethernet/microchip/lan743x_main.c
+++ b/drivers/net/ethernet/microchip/lan743x_main.c
@@ -1944,7 +1944,8 @@ static void lan743x_rx_update_tail(struct lan743x_rx *rx, int index)
index);
}
-static int lan743x_rx_init_ring_element(struct lan743x_rx *rx, int index)
+static int lan743x_rx_init_ring_element(struct lan743x_rx *rx, int index,
+ gfp_t gfp)
{
struct net_device *netdev = rx->adapter->netdev;
struct device *dev = &rx->adapter->pdev->dev;
@@ -1958,7 +1959,7 @@ static int lan743x_rx_init_ring_element(struct lan743x_rx *rx, int index)
descriptor = &rx->ring_cpu_ptr[index];
buffer_info = &rx->buffer_info[index];
- skb = __netdev_alloc_skb(netdev, buffer_length, GFP_ATOMIC | GFP_DMA);
+ skb = __netdev_alloc_skb(netdev, buffer_length, gfp);
if (!skb)
return -ENOMEM;
dma_ptr = dma_map_single(dev, skb->data, buffer_length, DMA_FROM_DEVICE);
@@ -2120,7 +2121,8 @@ static int lan743x_rx_process_buffer(struct lan743x_rx *rx)
/* save existing skb, allocate new skb and map to dma */
skb = buffer_info->skb;
- if (lan743x_rx_init_ring_element(rx, rx->last_head)) {
+ if (lan743x_rx_init_ring_element(rx, rx->last_head,
+ GFP_ATOMIC | GFP_DMA)) {
/* failed to allocate next skb.
* Memory is very low.
* Drop this packet and reuse buffer.
@@ -2335,13 +2337,16 @@ static int lan743x_rx_ring_init(struct lan743x_rx *rx)
rx->last_head = 0;
for (index = 0; index < rx->ring_size; index++) {
- ret = lan743x_rx_init_ring_element(rx, index);
+ ret = lan743x_rx_init_ring_element(rx, index, GFP_KERNEL);
if (ret)
goto cleanup;
}
return 0;
cleanup:
+ netif_warn(rx->adapter, ifup, rx->adapter->netdev,
+ "Error allocating memory for LAN743x\n");
+
lan743x_rx_ring_cleanup(rx);
return ret;
}
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From e8684db191e4164f3f5f3ad7dec04a6734c25f1c Mon Sep 17 00:00:00 2001
From: Yuiko Oshino <yuiko.oshino(a)microchip.com>
Date: Wed, 27 Oct 2021 14:23:02 -0400
Subject: [PATCH] net: ethernet: microchip: lan743x: Fix skb allocation failure
The driver allocates skb during ndo_open with GFP_ATOMIC which has high chance of failure when there are multiple instances.
GFP_KERNEL is enough while open and use GFP_ATOMIC only from interrupt context.
Fixes: 23f0703c125b ("lan743x: Add main source files for new lan743x driver")
Signed-off-by: Yuiko Oshino <yuiko.oshino(a)microchip.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
diff --git a/drivers/net/ethernet/microchip/lan743x_main.c b/drivers/net/ethernet/microchip/lan743x_main.c
index c4f32c7e0db1..4d5a5d6595b3 100644
--- a/drivers/net/ethernet/microchip/lan743x_main.c
+++ b/drivers/net/ethernet/microchip/lan743x_main.c
@@ -1944,7 +1944,8 @@ static void lan743x_rx_update_tail(struct lan743x_rx *rx, int index)
index);
}
-static int lan743x_rx_init_ring_element(struct lan743x_rx *rx, int index)
+static int lan743x_rx_init_ring_element(struct lan743x_rx *rx, int index,
+ gfp_t gfp)
{
struct net_device *netdev = rx->adapter->netdev;
struct device *dev = &rx->adapter->pdev->dev;
@@ -1958,7 +1959,7 @@ static int lan743x_rx_init_ring_element(struct lan743x_rx *rx, int index)
descriptor = &rx->ring_cpu_ptr[index];
buffer_info = &rx->buffer_info[index];
- skb = __netdev_alloc_skb(netdev, buffer_length, GFP_ATOMIC | GFP_DMA);
+ skb = __netdev_alloc_skb(netdev, buffer_length, gfp);
if (!skb)
return -ENOMEM;
dma_ptr = dma_map_single(dev, skb->data, buffer_length, DMA_FROM_DEVICE);
@@ -2120,7 +2121,8 @@ static int lan743x_rx_process_buffer(struct lan743x_rx *rx)
/* save existing skb, allocate new skb and map to dma */
skb = buffer_info->skb;
- if (lan743x_rx_init_ring_element(rx, rx->last_head)) {
+ if (lan743x_rx_init_ring_element(rx, rx->last_head,
+ GFP_ATOMIC | GFP_DMA)) {
/* failed to allocate next skb.
* Memory is very low.
* Drop this packet and reuse buffer.
@@ -2335,13 +2337,16 @@ static int lan743x_rx_ring_init(struct lan743x_rx *rx)
rx->last_head = 0;
for (index = 0; index < rx->ring_size; index++) {
- ret = lan743x_rx_init_ring_element(rx, index);
+ ret = lan743x_rx_init_ring_element(rx, index, GFP_KERNEL);
if (ret)
goto cleanup;
}
return 0;
cleanup:
+ netif_warn(rx->adapter, ifup, rx->adapter->netdev,
+ "Error allocating memory for LAN743x\n");
+
lan743x_rx_ring_cleanup(rx);
return ret;
}
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 3bda2e5df476417b6d08967e2d84234a59d57b1c Mon Sep 17 00:00:00 2001
From: Guangbin Huang <huangguangbin2(a)huawei.com>
Date: Wed, 27 Oct 2021 20:11:43 +0800
Subject: [PATCH] net: hns3: fix pause config problem after autoneg disabled
If a TP port is configured by follow steps:
1.ethtool -s ethx autoneg off speed 100 duplex full
2.ethtool -A ethx rx on tx on
3.ethtool -s ethx autoneg on(rx&tx negotiated pause results are off)
4.ethtool -s ethx autoneg off speed 100 duplex full
In step 3, driver will set rx&tx pause parameters of hardware to off as
pause parameters negotiated with link partner are off.
After step 4, the "ethtool -a ethx" command shows both rx and tx pause
parameters are on. However, pause parameters of hardware are still off
and port has no flow control function actually.
To fix this problem, if autoneg is disabled, driver uses its saved
parameters to restore pause of hardware. If the speed is not changed in
this case, there is no link state changed for phy, it will cause the pause
parameter is not taken effect, so we need to force phy to go down and up.
Fixes: aacbe27e82f0 ("net: hns3: modify how pause options is displayed")
Signed-off-by: Guangbin Huang <huangguangbin2(a)huawei.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
diff --git a/drivers/net/ethernet/hisilicon/hns3/hnae3.h b/drivers/net/ethernet/hisilicon/hns3/hnae3.h
index d701451596c8..da3a593f6a56 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hnae3.h
+++ b/drivers/net/ethernet/hisilicon/hns3/hnae3.h
@@ -568,6 +568,7 @@ struct hnae3_ae_ops {
u32 *auto_neg, u32 *rx_en, u32 *tx_en);
int (*set_pauseparam)(struct hnae3_handle *handle,
u32 auto_neg, u32 rx_en, u32 tx_en);
+ int (*restore_pauseparam)(struct hnae3_handle *handle);
int (*set_autoneg)(struct hnae3_handle *handle, bool enable);
int (*get_autoneg)(struct hnae3_handle *handle);
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c b/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
index 5ebd96f6833d..7d92dd273ed7 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
@@ -824,6 +824,26 @@ static int hns3_check_ksettings_param(const struct net_device *netdev,
return 0;
}
+static int hns3_set_phy_link_ksettings(struct net_device *netdev,
+ const struct ethtool_link_ksettings *cmd)
+{
+ struct hnae3_handle *handle = hns3_get_handle(netdev);
+ const struct hnae3_ae_ops *ops = handle->ae_algo->ops;
+ int ret;
+
+ if (cmd->base.speed == SPEED_1000 &&
+ cmd->base.autoneg == AUTONEG_DISABLE)
+ return -EINVAL;
+
+ if (cmd->base.autoneg == AUTONEG_DISABLE && ops->restore_pauseparam) {
+ ret = ops->restore_pauseparam(handle);
+ if (ret)
+ return ret;
+ }
+
+ return phy_ethtool_ksettings_set(netdev->phydev, cmd);
+}
+
static int hns3_set_link_ksettings(struct net_device *netdev,
const struct ethtool_link_ksettings *cmd)
{
@@ -842,16 +862,11 @@ static int hns3_set_link_ksettings(struct net_device *netdev,
cmd->base.autoneg, cmd->base.speed, cmd->base.duplex);
/* Only support ksettings_set for netdev with phy attached for now */
- if (netdev->phydev) {
- if (cmd->base.speed == SPEED_1000 &&
- cmd->base.autoneg == AUTONEG_DISABLE)
- return -EINVAL;
-
- return phy_ethtool_ksettings_set(netdev->phydev, cmd);
- } else if (test_bit(HNAE3_DEV_SUPPORT_PHY_IMP_B, ae_dev->caps) &&
- ops->set_phy_link_ksettings) {
+ if (netdev->phydev)
+ return hns3_set_phy_link_ksettings(netdev, cmd);
+ else if (test_bit(HNAE3_DEV_SUPPORT_PHY_IMP_B, ae_dev->caps) &&
+ ops->set_phy_link_ksettings)
return ops->set_phy_link_ksettings(handle, cmd);
- }
if (ae_dev->dev_version < HNAE3_DEVICE_VERSION_V2)
return -EOPNOTSUPP;
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
index dcd40cc73082..c6b9806c75a5 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
@@ -11021,6 +11021,35 @@ static int hclge_set_pauseparam(struct hnae3_handle *handle, u32 auto_neg,
return -EOPNOTSUPP;
}
+static int hclge_restore_pauseparam(struct hnae3_handle *handle)
+{
+ struct hclge_vport *vport = hclge_get_vport(handle);
+ struct hclge_dev *hdev = vport->back;
+ u32 auto_neg, rx_pause, tx_pause;
+ int ret;
+
+ hclge_get_pauseparam(handle, &auto_neg, &rx_pause, &tx_pause);
+ /* when autoneg is disabled, the pause setting of phy has no effect
+ * unless the link goes down.
+ */
+ ret = phy_suspend(hdev->hw.mac.phydev);
+ if (ret)
+ return ret;
+
+ phy_set_asym_pause(hdev->hw.mac.phydev, rx_pause, tx_pause);
+
+ ret = phy_resume(hdev->hw.mac.phydev);
+ if (ret)
+ return ret;
+
+ ret = hclge_mac_pause_setup_hw(hdev);
+ if (ret)
+ dev_err(&hdev->pdev->dev,
+ "restore pauseparam error, ret = %d.\n", ret);
+
+ return ret;
+}
+
static void hclge_get_ksettings_an_result(struct hnae3_handle *handle,
u8 *auto_neg, u32 *speed, u8 *duplex)
{
@@ -12984,6 +13013,7 @@ static const struct hnae3_ae_ops hclge_ops = {
.halt_autoneg = hclge_halt_autoneg,
.get_pauseparam = hclge_get_pauseparam,
.set_pauseparam = hclge_set_pauseparam,
+ .restore_pauseparam = hclge_restore_pauseparam,
.set_mtu = hclge_set_mtu,
.reset_queue = hclge_reset_tqp,
.get_stats = hclge_get_stats,
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.c
index 95074e91a846..124791e4bfee 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.c
@@ -1435,7 +1435,7 @@ static int hclge_bp_setup_hw(struct hclge_dev *hdev, u8 tc)
return 0;
}
-static int hclge_mac_pause_setup_hw(struct hclge_dev *hdev)
+int hclge_mac_pause_setup_hw(struct hclge_dev *hdev)
{
bool tx_en, rx_en;
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.h b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.h
index 2ee9b795f71d..4b2c3a788980 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.h
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.h
@@ -244,6 +244,7 @@ int hclge_tm_get_pri_weight(struct hclge_dev *hdev, u8 pri_id, u8 *weight);
int hclge_tm_get_pri_shaper(struct hclge_dev *hdev, u8 pri_id,
enum hclge_opcode_type cmd,
struct hclge_tm_shaper_para *para);
+int hclge_mac_pause_setup_hw(struct hclge_dev *hdev);
int hclge_tm_get_q_to_qs_map(struct hclge_dev *hdev, u16 q_id, u16 *qset_id);
int hclge_tm_get_q_to_tc(struct hclge_dev *hdev, u16 q_id, u8 *tc_id);
int hclge_tm_get_pg_to_pri_map(struct hclge_dev *hdev, u8 pg_id,
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 3bda2e5df476417b6d08967e2d84234a59d57b1c Mon Sep 17 00:00:00 2001
From: Guangbin Huang <huangguangbin2(a)huawei.com>
Date: Wed, 27 Oct 2021 20:11:43 +0800
Subject: [PATCH] net: hns3: fix pause config problem after autoneg disabled
If a TP port is configured by follow steps:
1.ethtool -s ethx autoneg off speed 100 duplex full
2.ethtool -A ethx rx on tx on
3.ethtool -s ethx autoneg on(rx&tx negotiated pause results are off)
4.ethtool -s ethx autoneg off speed 100 duplex full
In step 3, driver will set rx&tx pause parameters of hardware to off as
pause parameters negotiated with link partner are off.
After step 4, the "ethtool -a ethx" command shows both rx and tx pause
parameters are on. However, pause parameters of hardware are still off
and port has no flow control function actually.
To fix this problem, if autoneg is disabled, driver uses its saved
parameters to restore pause of hardware. If the speed is not changed in
this case, there is no link state changed for phy, it will cause the pause
parameter is not taken effect, so we need to force phy to go down and up.
Fixes: aacbe27e82f0 ("net: hns3: modify how pause options is displayed")
Signed-off-by: Guangbin Huang <huangguangbin2(a)huawei.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
diff --git a/drivers/net/ethernet/hisilicon/hns3/hnae3.h b/drivers/net/ethernet/hisilicon/hns3/hnae3.h
index d701451596c8..da3a593f6a56 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hnae3.h
+++ b/drivers/net/ethernet/hisilicon/hns3/hnae3.h
@@ -568,6 +568,7 @@ struct hnae3_ae_ops {
u32 *auto_neg, u32 *rx_en, u32 *tx_en);
int (*set_pauseparam)(struct hnae3_handle *handle,
u32 auto_neg, u32 rx_en, u32 tx_en);
+ int (*restore_pauseparam)(struct hnae3_handle *handle);
int (*set_autoneg)(struct hnae3_handle *handle, bool enable);
int (*get_autoneg)(struct hnae3_handle *handle);
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c b/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
index 5ebd96f6833d..7d92dd273ed7 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
@@ -824,6 +824,26 @@ static int hns3_check_ksettings_param(const struct net_device *netdev,
return 0;
}
+static int hns3_set_phy_link_ksettings(struct net_device *netdev,
+ const struct ethtool_link_ksettings *cmd)
+{
+ struct hnae3_handle *handle = hns3_get_handle(netdev);
+ const struct hnae3_ae_ops *ops = handle->ae_algo->ops;
+ int ret;
+
+ if (cmd->base.speed == SPEED_1000 &&
+ cmd->base.autoneg == AUTONEG_DISABLE)
+ return -EINVAL;
+
+ if (cmd->base.autoneg == AUTONEG_DISABLE && ops->restore_pauseparam) {
+ ret = ops->restore_pauseparam(handle);
+ if (ret)
+ return ret;
+ }
+
+ return phy_ethtool_ksettings_set(netdev->phydev, cmd);
+}
+
static int hns3_set_link_ksettings(struct net_device *netdev,
const struct ethtool_link_ksettings *cmd)
{
@@ -842,16 +862,11 @@ static int hns3_set_link_ksettings(struct net_device *netdev,
cmd->base.autoneg, cmd->base.speed, cmd->base.duplex);
/* Only support ksettings_set for netdev with phy attached for now */
- if (netdev->phydev) {
- if (cmd->base.speed == SPEED_1000 &&
- cmd->base.autoneg == AUTONEG_DISABLE)
- return -EINVAL;
-
- return phy_ethtool_ksettings_set(netdev->phydev, cmd);
- } else if (test_bit(HNAE3_DEV_SUPPORT_PHY_IMP_B, ae_dev->caps) &&
- ops->set_phy_link_ksettings) {
+ if (netdev->phydev)
+ return hns3_set_phy_link_ksettings(netdev, cmd);
+ else if (test_bit(HNAE3_DEV_SUPPORT_PHY_IMP_B, ae_dev->caps) &&
+ ops->set_phy_link_ksettings)
return ops->set_phy_link_ksettings(handle, cmd);
- }
if (ae_dev->dev_version < HNAE3_DEVICE_VERSION_V2)
return -EOPNOTSUPP;
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
index dcd40cc73082..c6b9806c75a5 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
@@ -11021,6 +11021,35 @@ static int hclge_set_pauseparam(struct hnae3_handle *handle, u32 auto_neg,
return -EOPNOTSUPP;
}
+static int hclge_restore_pauseparam(struct hnae3_handle *handle)
+{
+ struct hclge_vport *vport = hclge_get_vport(handle);
+ struct hclge_dev *hdev = vport->back;
+ u32 auto_neg, rx_pause, tx_pause;
+ int ret;
+
+ hclge_get_pauseparam(handle, &auto_neg, &rx_pause, &tx_pause);
+ /* when autoneg is disabled, the pause setting of phy has no effect
+ * unless the link goes down.
+ */
+ ret = phy_suspend(hdev->hw.mac.phydev);
+ if (ret)
+ return ret;
+
+ phy_set_asym_pause(hdev->hw.mac.phydev, rx_pause, tx_pause);
+
+ ret = phy_resume(hdev->hw.mac.phydev);
+ if (ret)
+ return ret;
+
+ ret = hclge_mac_pause_setup_hw(hdev);
+ if (ret)
+ dev_err(&hdev->pdev->dev,
+ "restore pauseparam error, ret = %d.\n", ret);
+
+ return ret;
+}
+
static void hclge_get_ksettings_an_result(struct hnae3_handle *handle,
u8 *auto_neg, u32 *speed, u8 *duplex)
{
@@ -12984,6 +13013,7 @@ static const struct hnae3_ae_ops hclge_ops = {
.halt_autoneg = hclge_halt_autoneg,
.get_pauseparam = hclge_get_pauseparam,
.set_pauseparam = hclge_set_pauseparam,
+ .restore_pauseparam = hclge_restore_pauseparam,
.set_mtu = hclge_set_mtu,
.reset_queue = hclge_reset_tqp,
.get_stats = hclge_get_stats,
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.c
index 95074e91a846..124791e4bfee 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.c
@@ -1435,7 +1435,7 @@ static int hclge_bp_setup_hw(struct hclge_dev *hdev, u8 tc)
return 0;
}
-static int hclge_mac_pause_setup_hw(struct hclge_dev *hdev)
+int hclge_mac_pause_setup_hw(struct hclge_dev *hdev)
{
bool tx_en, rx_en;
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.h b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.h
index 2ee9b795f71d..4b2c3a788980 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.h
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.h
@@ -244,6 +244,7 @@ int hclge_tm_get_pri_weight(struct hclge_dev *hdev, u8 pri_id, u8 *weight);
int hclge_tm_get_pri_shaper(struct hclge_dev *hdev, u8 pri_id,
enum hclge_opcode_type cmd,
struct hclge_tm_shaper_para *para);
+int hclge_mac_pause_setup_hw(struct hclge_dev *hdev);
int hclge_tm_get_q_to_qs_map(struct hclge_dev *hdev, u16 q_id, u16 *qset_id);
int hclge_tm_get_q_to_tc(struct hclge_dev *hdev, u16 q_id, u8 *tc_id);
int hclge_tm_get_pg_to_pri_map(struct hclge_dev *hdev, u8 pg_id,
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From ace19b992436a257d9a793672e57abc28fe83e2e Mon Sep 17 00:00:00 2001
From: Trevor Woerner <twoerner(a)gmail.com>
Date: Sun, 24 Oct 2021 13:50:02 -0400
Subject: [PATCH] net: nxp: lpc_eth.c: avoid hang when bringing interface down
A hard hang is observed whenever the ethernet interface is brought
down. If the PHY is stopped before the LPC core block is reset,
the SoC will hang. Comparing lpc_eth_close() and lpc_eth_open() I
re-arranged the ordering of the functions calls in lpc_eth_close() to
reset the hardware before stopping the PHY.
Fixes: b7370112f519 ("lpc32xx: Added ethernet driver")
Signed-off-by: Trevor Woerner <twoerner(a)gmail.com>
Acked-by: Vladimir Zapolskiy <vz(a)mleia.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
diff --git a/drivers/net/ethernet/nxp/lpc_eth.c b/drivers/net/ethernet/nxp/lpc_eth.c
index d29fe562b3de..c910fa2f40a4 100644
--- a/drivers/net/ethernet/nxp/lpc_eth.c
+++ b/drivers/net/ethernet/nxp/lpc_eth.c
@@ -1015,9 +1015,6 @@ static int lpc_eth_close(struct net_device *ndev)
napi_disable(&pldat->napi);
netif_stop_queue(ndev);
- if (ndev->phydev)
- phy_stop(ndev->phydev);
-
spin_lock_irqsave(&pldat->lock, flags);
__lpc_eth_reset(pldat);
netif_carrier_off(ndev);
@@ -1025,6 +1022,8 @@ static int lpc_eth_close(struct net_device *ndev)
writel(0, LPC_ENET_MAC2(pldat->net_base));
spin_unlock_irqrestore(&pldat->lock, flags);
+ if (ndev->phydev)
+ phy_stop(ndev->phydev);
clk_disable_unprepare(pldat->clk);
return 0;
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 759635760a804b0d8ad0cc677b650f1544cae22f Mon Sep 17 00:00:00 2001
From: Ido Schimmel <idosch(a)nvidia.com>
Date: Sun, 24 Oct 2021 09:40:14 +0300
Subject: [PATCH] mlxsw: pci: Recycle received packet upon allocation failure
When the driver fails to allocate a new Rx buffer, it passes an empty Rx
descriptor (contains zero address and size) to the device and marks it
as invalid by setting the skb pointer in the descriptor's metadata to
NULL.
After processing enough Rx descriptors, the driver will try to process
the invalid descriptor, but will return immediately seeing that the skb
pointer is NULL. Since the driver no longer passes new Rx descriptors to
the device, the Rx queue will eventually become full and the device will
start to drop packets.
Fix this by recycling the received packet if allocation of the new
packet failed. This means that allocation is no longer performed at the
end of the Rx routine, but at the start, before tearing down the DMA
mapping of the received packet.
Remove the comment about the descriptor being zeroed as it is no longer
correct. This is OK because we either use the descriptor as-is (when
recycling) or overwrite its address and size fields with that of the
newly allocated Rx buffer.
The issue was discovered when a process ("perf") consumed too much
memory and put the system under memory pressure. It can be reproduced by
injecting slab allocation failures [1]. After the fix, the Rx queue no
longer comes to a halt.
[1]
# echo 10 > /sys/kernel/debug/failslab/times
# echo 1000 > /sys/kernel/debug/failslab/interval
# echo 100 > /sys/kernel/debug/failslab/probability
FAULT_INJECTION: forcing a failure.
name failslab, interval 1000, probability 100, space 0, times 8
[...]
Call Trace:
<IRQ>
dump_stack_lvl+0x34/0x44
should_fail.cold+0x32/0x37
should_failslab+0x5/0x10
kmem_cache_alloc_node+0x23/0x190
__alloc_skb+0x1f9/0x280
__netdev_alloc_skb+0x3a/0x150
mlxsw_pci_rdq_skb_alloc+0x24/0x90
mlxsw_pci_cq_tasklet+0x3dc/0x1200
tasklet_action_common.constprop.0+0x9f/0x100
__do_softirq+0xb5/0x252
irq_exit_rcu+0x7a/0xa0
common_interrupt+0x83/0xa0
</IRQ>
asm_common_interrupt+0x1e/0x40
RIP: 0010:cpuidle_enter_state+0xc8/0x340
[...]
mlxsw_spectrum2 0000:06:00.0: Failed to alloc skb for RDQ
Fixes: eda6500a987a ("mlxsw: Add PCI bus implementation")
Signed-off-by: Ido Schimmel <idosch(a)nvidia.com>
Reviewed-by: Petr Machata <petrm(a)nvidia.com>
Link: https://lore.kernel.org/r/20211024064014.1060919-1-idosch@idosch.org
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/drivers/net/ethernet/mellanox/mlxsw/pci.c b/drivers/net/ethernet/mellanox/mlxsw/pci.c
index 13b0259f7ea6..fcace73eae40 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/pci.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/pci.c
@@ -353,13 +353,10 @@ static int mlxsw_pci_rdq_skb_alloc(struct mlxsw_pci *mlxsw_pci,
struct sk_buff *skb;
int err;
- elem_info->u.rdq.skb = NULL;
skb = netdev_alloc_skb_ip_align(NULL, buf_len);
if (!skb)
return -ENOMEM;
- /* Assume that wqe was previously zeroed. */
-
err = mlxsw_pci_wqe_frag_map(mlxsw_pci, wqe, 0, skb->data,
buf_len, DMA_FROM_DEVICE);
if (err)
@@ -597,21 +594,26 @@ static void mlxsw_pci_cqe_rdq_handle(struct mlxsw_pci *mlxsw_pci,
struct pci_dev *pdev = mlxsw_pci->pdev;
struct mlxsw_pci_queue_elem_info *elem_info;
struct mlxsw_rx_info rx_info = {};
- char *wqe;
+ char wqe[MLXSW_PCI_WQE_SIZE];
struct sk_buff *skb;
u16 byte_count;
int err;
elem_info = mlxsw_pci_queue_elem_info_consumer_get(q);
- skb = elem_info->u.sdq.skb;
- if (!skb)
- return;
- wqe = elem_info->elem;
- mlxsw_pci_wqe_frag_unmap(mlxsw_pci, wqe, 0, DMA_FROM_DEVICE);
+ skb = elem_info->u.rdq.skb;
+ memcpy(wqe, elem_info->elem, MLXSW_PCI_WQE_SIZE);
if (q->consumer_counter++ != consumer_counter_limit)
dev_dbg_ratelimited(&pdev->dev, "Consumer counter does not match limit in RDQ\n");
+ err = mlxsw_pci_rdq_skb_alloc(mlxsw_pci, elem_info);
+ if (err) {
+ dev_err_ratelimited(&pdev->dev, "Failed to alloc skb for RDQ\n");
+ goto out;
+ }
+
+ mlxsw_pci_wqe_frag_unmap(mlxsw_pci, wqe, 0, DMA_FROM_DEVICE);
+
if (mlxsw_pci_cqe_lag_get(cqe_v, cqe)) {
rx_info.is_lag = true;
rx_info.u.lag_id = mlxsw_pci_cqe_lag_id_get(cqe_v, cqe);
@@ -647,10 +649,7 @@ static void mlxsw_pci_cqe_rdq_handle(struct mlxsw_pci *mlxsw_pci,
skb_put(skb, byte_count);
mlxsw_core_skb_receive(mlxsw_pci->core, skb, &rx_info);
- memset(wqe, 0, q->elem_size);
- err = mlxsw_pci_rdq_skb_alloc(mlxsw_pci, elem_info);
- if (err)
- dev_dbg_ratelimited(&pdev->dev, "Failed to alloc skb for RDQ\n");
+out:
/* Everything is set up, ring doorbell to pass elem to HW */
q->producer_counter++;
mlxsw_pci_queue_doorbell_producer_ring(mlxsw_pci, q);
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 759635760a804b0d8ad0cc677b650f1544cae22f Mon Sep 17 00:00:00 2001
From: Ido Schimmel <idosch(a)nvidia.com>
Date: Sun, 24 Oct 2021 09:40:14 +0300
Subject: [PATCH] mlxsw: pci: Recycle received packet upon allocation failure
When the driver fails to allocate a new Rx buffer, it passes an empty Rx
descriptor (contains zero address and size) to the device and marks it
as invalid by setting the skb pointer in the descriptor's metadata to
NULL.
After processing enough Rx descriptors, the driver will try to process
the invalid descriptor, but will return immediately seeing that the skb
pointer is NULL. Since the driver no longer passes new Rx descriptors to
the device, the Rx queue will eventually become full and the device will
start to drop packets.
Fix this by recycling the received packet if allocation of the new
packet failed. This means that allocation is no longer performed at the
end of the Rx routine, but at the start, before tearing down the DMA
mapping of the received packet.
Remove the comment about the descriptor being zeroed as it is no longer
correct. This is OK because we either use the descriptor as-is (when
recycling) or overwrite its address and size fields with that of the
newly allocated Rx buffer.
The issue was discovered when a process ("perf") consumed too much
memory and put the system under memory pressure. It can be reproduced by
injecting slab allocation failures [1]. After the fix, the Rx queue no
longer comes to a halt.
[1]
# echo 10 > /sys/kernel/debug/failslab/times
# echo 1000 > /sys/kernel/debug/failslab/interval
# echo 100 > /sys/kernel/debug/failslab/probability
FAULT_INJECTION: forcing a failure.
name failslab, interval 1000, probability 100, space 0, times 8
[...]
Call Trace:
<IRQ>
dump_stack_lvl+0x34/0x44
should_fail.cold+0x32/0x37
should_failslab+0x5/0x10
kmem_cache_alloc_node+0x23/0x190
__alloc_skb+0x1f9/0x280
__netdev_alloc_skb+0x3a/0x150
mlxsw_pci_rdq_skb_alloc+0x24/0x90
mlxsw_pci_cq_tasklet+0x3dc/0x1200
tasklet_action_common.constprop.0+0x9f/0x100
__do_softirq+0xb5/0x252
irq_exit_rcu+0x7a/0xa0
common_interrupt+0x83/0xa0
</IRQ>
asm_common_interrupt+0x1e/0x40
RIP: 0010:cpuidle_enter_state+0xc8/0x340
[...]
mlxsw_spectrum2 0000:06:00.0: Failed to alloc skb for RDQ
Fixes: eda6500a987a ("mlxsw: Add PCI bus implementation")
Signed-off-by: Ido Schimmel <idosch(a)nvidia.com>
Reviewed-by: Petr Machata <petrm(a)nvidia.com>
Link: https://lore.kernel.org/r/20211024064014.1060919-1-idosch@idosch.org
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/drivers/net/ethernet/mellanox/mlxsw/pci.c b/drivers/net/ethernet/mellanox/mlxsw/pci.c
index 13b0259f7ea6..fcace73eae40 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/pci.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/pci.c
@@ -353,13 +353,10 @@ static int mlxsw_pci_rdq_skb_alloc(struct mlxsw_pci *mlxsw_pci,
struct sk_buff *skb;
int err;
- elem_info->u.rdq.skb = NULL;
skb = netdev_alloc_skb_ip_align(NULL, buf_len);
if (!skb)
return -ENOMEM;
- /* Assume that wqe was previously zeroed. */
-
err = mlxsw_pci_wqe_frag_map(mlxsw_pci, wqe, 0, skb->data,
buf_len, DMA_FROM_DEVICE);
if (err)
@@ -597,21 +594,26 @@ static void mlxsw_pci_cqe_rdq_handle(struct mlxsw_pci *mlxsw_pci,
struct pci_dev *pdev = mlxsw_pci->pdev;
struct mlxsw_pci_queue_elem_info *elem_info;
struct mlxsw_rx_info rx_info = {};
- char *wqe;
+ char wqe[MLXSW_PCI_WQE_SIZE];
struct sk_buff *skb;
u16 byte_count;
int err;
elem_info = mlxsw_pci_queue_elem_info_consumer_get(q);
- skb = elem_info->u.sdq.skb;
- if (!skb)
- return;
- wqe = elem_info->elem;
- mlxsw_pci_wqe_frag_unmap(mlxsw_pci, wqe, 0, DMA_FROM_DEVICE);
+ skb = elem_info->u.rdq.skb;
+ memcpy(wqe, elem_info->elem, MLXSW_PCI_WQE_SIZE);
if (q->consumer_counter++ != consumer_counter_limit)
dev_dbg_ratelimited(&pdev->dev, "Consumer counter does not match limit in RDQ\n");
+ err = mlxsw_pci_rdq_skb_alloc(mlxsw_pci, elem_info);
+ if (err) {
+ dev_err_ratelimited(&pdev->dev, "Failed to alloc skb for RDQ\n");
+ goto out;
+ }
+
+ mlxsw_pci_wqe_frag_unmap(mlxsw_pci, wqe, 0, DMA_FROM_DEVICE);
+
if (mlxsw_pci_cqe_lag_get(cqe_v, cqe)) {
rx_info.is_lag = true;
rx_info.u.lag_id = mlxsw_pci_cqe_lag_id_get(cqe_v, cqe);
@@ -647,10 +649,7 @@ static void mlxsw_pci_cqe_rdq_handle(struct mlxsw_pci *mlxsw_pci,
skb_put(skb, byte_count);
mlxsw_core_skb_receive(mlxsw_pci->core, skb, &rx_info);
- memset(wqe, 0, q->elem_size);
- err = mlxsw_pci_rdq_skb_alloc(mlxsw_pci, elem_info);
- if (err)
- dev_dbg_ratelimited(&pdev->dev, "Failed to alloc skb for RDQ\n");
+out:
/* Everything is set up, ring doorbell to pass elem to HW */
q->producer_counter++;
mlxsw_pci_queue_doorbell_producer_ring(mlxsw_pci, q);
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 759635760a804b0d8ad0cc677b650f1544cae22f Mon Sep 17 00:00:00 2001
From: Ido Schimmel <idosch(a)nvidia.com>
Date: Sun, 24 Oct 2021 09:40:14 +0300
Subject: [PATCH] mlxsw: pci: Recycle received packet upon allocation failure
When the driver fails to allocate a new Rx buffer, it passes an empty Rx
descriptor (contains zero address and size) to the device and marks it
as invalid by setting the skb pointer in the descriptor's metadata to
NULL.
After processing enough Rx descriptors, the driver will try to process
the invalid descriptor, but will return immediately seeing that the skb
pointer is NULL. Since the driver no longer passes new Rx descriptors to
the device, the Rx queue will eventually become full and the device will
start to drop packets.
Fix this by recycling the received packet if allocation of the new
packet failed. This means that allocation is no longer performed at the
end of the Rx routine, but at the start, before tearing down the DMA
mapping of the received packet.
Remove the comment about the descriptor being zeroed as it is no longer
correct. This is OK because we either use the descriptor as-is (when
recycling) or overwrite its address and size fields with that of the
newly allocated Rx buffer.
The issue was discovered when a process ("perf") consumed too much
memory and put the system under memory pressure. It can be reproduced by
injecting slab allocation failures [1]. After the fix, the Rx queue no
longer comes to a halt.
[1]
# echo 10 > /sys/kernel/debug/failslab/times
# echo 1000 > /sys/kernel/debug/failslab/interval
# echo 100 > /sys/kernel/debug/failslab/probability
FAULT_INJECTION: forcing a failure.
name failslab, interval 1000, probability 100, space 0, times 8
[...]
Call Trace:
<IRQ>
dump_stack_lvl+0x34/0x44
should_fail.cold+0x32/0x37
should_failslab+0x5/0x10
kmem_cache_alloc_node+0x23/0x190
__alloc_skb+0x1f9/0x280
__netdev_alloc_skb+0x3a/0x150
mlxsw_pci_rdq_skb_alloc+0x24/0x90
mlxsw_pci_cq_tasklet+0x3dc/0x1200
tasklet_action_common.constprop.0+0x9f/0x100
__do_softirq+0xb5/0x252
irq_exit_rcu+0x7a/0xa0
common_interrupt+0x83/0xa0
</IRQ>
asm_common_interrupt+0x1e/0x40
RIP: 0010:cpuidle_enter_state+0xc8/0x340
[...]
mlxsw_spectrum2 0000:06:00.0: Failed to alloc skb for RDQ
Fixes: eda6500a987a ("mlxsw: Add PCI bus implementation")
Signed-off-by: Ido Schimmel <idosch(a)nvidia.com>
Reviewed-by: Petr Machata <petrm(a)nvidia.com>
Link: https://lore.kernel.org/r/20211024064014.1060919-1-idosch@idosch.org
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/drivers/net/ethernet/mellanox/mlxsw/pci.c b/drivers/net/ethernet/mellanox/mlxsw/pci.c
index 13b0259f7ea6..fcace73eae40 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/pci.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/pci.c
@@ -353,13 +353,10 @@ static int mlxsw_pci_rdq_skb_alloc(struct mlxsw_pci *mlxsw_pci,
struct sk_buff *skb;
int err;
- elem_info->u.rdq.skb = NULL;
skb = netdev_alloc_skb_ip_align(NULL, buf_len);
if (!skb)
return -ENOMEM;
- /* Assume that wqe was previously zeroed. */
-
err = mlxsw_pci_wqe_frag_map(mlxsw_pci, wqe, 0, skb->data,
buf_len, DMA_FROM_DEVICE);
if (err)
@@ -597,21 +594,26 @@ static void mlxsw_pci_cqe_rdq_handle(struct mlxsw_pci *mlxsw_pci,
struct pci_dev *pdev = mlxsw_pci->pdev;
struct mlxsw_pci_queue_elem_info *elem_info;
struct mlxsw_rx_info rx_info = {};
- char *wqe;
+ char wqe[MLXSW_PCI_WQE_SIZE];
struct sk_buff *skb;
u16 byte_count;
int err;
elem_info = mlxsw_pci_queue_elem_info_consumer_get(q);
- skb = elem_info->u.sdq.skb;
- if (!skb)
- return;
- wqe = elem_info->elem;
- mlxsw_pci_wqe_frag_unmap(mlxsw_pci, wqe, 0, DMA_FROM_DEVICE);
+ skb = elem_info->u.rdq.skb;
+ memcpy(wqe, elem_info->elem, MLXSW_PCI_WQE_SIZE);
if (q->consumer_counter++ != consumer_counter_limit)
dev_dbg_ratelimited(&pdev->dev, "Consumer counter does not match limit in RDQ\n");
+ err = mlxsw_pci_rdq_skb_alloc(mlxsw_pci, elem_info);
+ if (err) {
+ dev_err_ratelimited(&pdev->dev, "Failed to alloc skb for RDQ\n");
+ goto out;
+ }
+
+ mlxsw_pci_wqe_frag_unmap(mlxsw_pci, wqe, 0, DMA_FROM_DEVICE);
+
if (mlxsw_pci_cqe_lag_get(cqe_v, cqe)) {
rx_info.is_lag = true;
rx_info.u.lag_id = mlxsw_pci_cqe_lag_id_get(cqe_v, cqe);
@@ -647,10 +649,7 @@ static void mlxsw_pci_cqe_rdq_handle(struct mlxsw_pci *mlxsw_pci,
skb_put(skb, byte_count);
mlxsw_core_skb_receive(mlxsw_pci->core, skb, &rx_info);
- memset(wqe, 0, q->elem_size);
- err = mlxsw_pci_rdq_skb_alloc(mlxsw_pci, elem_info);
- if (err)
- dev_dbg_ratelimited(&pdev->dev, "Failed to alloc skb for RDQ\n");
+out:
/* Everything is set up, ring doorbell to pass elem to HW */
q->producer_counter++;
mlxsw_pci_queue_doorbell_producer_ring(mlxsw_pci, q);
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 759635760a804b0d8ad0cc677b650f1544cae22f Mon Sep 17 00:00:00 2001
From: Ido Schimmel <idosch(a)nvidia.com>
Date: Sun, 24 Oct 2021 09:40:14 +0300
Subject: [PATCH] mlxsw: pci: Recycle received packet upon allocation failure
When the driver fails to allocate a new Rx buffer, it passes an empty Rx
descriptor (contains zero address and size) to the device and marks it
as invalid by setting the skb pointer in the descriptor's metadata to
NULL.
After processing enough Rx descriptors, the driver will try to process
the invalid descriptor, but will return immediately seeing that the skb
pointer is NULL. Since the driver no longer passes new Rx descriptors to
the device, the Rx queue will eventually become full and the device will
start to drop packets.
Fix this by recycling the received packet if allocation of the new
packet failed. This means that allocation is no longer performed at the
end of the Rx routine, but at the start, before tearing down the DMA
mapping of the received packet.
Remove the comment about the descriptor being zeroed as it is no longer
correct. This is OK because we either use the descriptor as-is (when
recycling) or overwrite its address and size fields with that of the
newly allocated Rx buffer.
The issue was discovered when a process ("perf") consumed too much
memory and put the system under memory pressure. It can be reproduced by
injecting slab allocation failures [1]. After the fix, the Rx queue no
longer comes to a halt.
[1]
# echo 10 > /sys/kernel/debug/failslab/times
# echo 1000 > /sys/kernel/debug/failslab/interval
# echo 100 > /sys/kernel/debug/failslab/probability
FAULT_INJECTION: forcing a failure.
name failslab, interval 1000, probability 100, space 0, times 8
[...]
Call Trace:
<IRQ>
dump_stack_lvl+0x34/0x44
should_fail.cold+0x32/0x37
should_failslab+0x5/0x10
kmem_cache_alloc_node+0x23/0x190
__alloc_skb+0x1f9/0x280
__netdev_alloc_skb+0x3a/0x150
mlxsw_pci_rdq_skb_alloc+0x24/0x90
mlxsw_pci_cq_tasklet+0x3dc/0x1200
tasklet_action_common.constprop.0+0x9f/0x100
__do_softirq+0xb5/0x252
irq_exit_rcu+0x7a/0xa0
common_interrupt+0x83/0xa0
</IRQ>
asm_common_interrupt+0x1e/0x40
RIP: 0010:cpuidle_enter_state+0xc8/0x340
[...]
mlxsw_spectrum2 0000:06:00.0: Failed to alloc skb for RDQ
Fixes: eda6500a987a ("mlxsw: Add PCI bus implementation")
Signed-off-by: Ido Schimmel <idosch(a)nvidia.com>
Reviewed-by: Petr Machata <petrm(a)nvidia.com>
Link: https://lore.kernel.org/r/20211024064014.1060919-1-idosch@idosch.org
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/drivers/net/ethernet/mellanox/mlxsw/pci.c b/drivers/net/ethernet/mellanox/mlxsw/pci.c
index 13b0259f7ea6..fcace73eae40 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/pci.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/pci.c
@@ -353,13 +353,10 @@ static int mlxsw_pci_rdq_skb_alloc(struct mlxsw_pci *mlxsw_pci,
struct sk_buff *skb;
int err;
- elem_info->u.rdq.skb = NULL;
skb = netdev_alloc_skb_ip_align(NULL, buf_len);
if (!skb)
return -ENOMEM;
- /* Assume that wqe was previously zeroed. */
-
err = mlxsw_pci_wqe_frag_map(mlxsw_pci, wqe, 0, skb->data,
buf_len, DMA_FROM_DEVICE);
if (err)
@@ -597,21 +594,26 @@ static void mlxsw_pci_cqe_rdq_handle(struct mlxsw_pci *mlxsw_pci,
struct pci_dev *pdev = mlxsw_pci->pdev;
struct mlxsw_pci_queue_elem_info *elem_info;
struct mlxsw_rx_info rx_info = {};
- char *wqe;
+ char wqe[MLXSW_PCI_WQE_SIZE];
struct sk_buff *skb;
u16 byte_count;
int err;
elem_info = mlxsw_pci_queue_elem_info_consumer_get(q);
- skb = elem_info->u.sdq.skb;
- if (!skb)
- return;
- wqe = elem_info->elem;
- mlxsw_pci_wqe_frag_unmap(mlxsw_pci, wqe, 0, DMA_FROM_DEVICE);
+ skb = elem_info->u.rdq.skb;
+ memcpy(wqe, elem_info->elem, MLXSW_PCI_WQE_SIZE);
if (q->consumer_counter++ != consumer_counter_limit)
dev_dbg_ratelimited(&pdev->dev, "Consumer counter does not match limit in RDQ\n");
+ err = mlxsw_pci_rdq_skb_alloc(mlxsw_pci, elem_info);
+ if (err) {
+ dev_err_ratelimited(&pdev->dev, "Failed to alloc skb for RDQ\n");
+ goto out;
+ }
+
+ mlxsw_pci_wqe_frag_unmap(mlxsw_pci, wqe, 0, DMA_FROM_DEVICE);
+
if (mlxsw_pci_cqe_lag_get(cqe_v, cqe)) {
rx_info.is_lag = true;
rx_info.u.lag_id = mlxsw_pci_cqe_lag_id_get(cqe_v, cqe);
@@ -647,10 +649,7 @@ static void mlxsw_pci_cqe_rdq_handle(struct mlxsw_pci *mlxsw_pci,
skb_put(skb, byte_count);
mlxsw_core_skb_receive(mlxsw_pci->core, skb, &rx_info);
- memset(wqe, 0, q->elem_size);
- err = mlxsw_pci_rdq_skb_alloc(mlxsw_pci, elem_info);
- if (err)
- dev_dbg_ratelimited(&pdev->dev, "Failed to alloc skb for RDQ\n");
+out:
/* Everything is set up, ring doorbell to pass elem to HW */
q->producer_counter++;
mlxsw_pci_queue_doorbell_producer_ring(mlxsw_pci, q);
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 759635760a804b0d8ad0cc677b650f1544cae22f Mon Sep 17 00:00:00 2001
From: Ido Schimmel <idosch(a)nvidia.com>
Date: Sun, 24 Oct 2021 09:40:14 +0300
Subject: [PATCH] mlxsw: pci: Recycle received packet upon allocation failure
When the driver fails to allocate a new Rx buffer, it passes an empty Rx
descriptor (contains zero address and size) to the device and marks it
as invalid by setting the skb pointer in the descriptor's metadata to
NULL.
After processing enough Rx descriptors, the driver will try to process
the invalid descriptor, but will return immediately seeing that the skb
pointer is NULL. Since the driver no longer passes new Rx descriptors to
the device, the Rx queue will eventually become full and the device will
start to drop packets.
Fix this by recycling the received packet if allocation of the new
packet failed. This means that allocation is no longer performed at the
end of the Rx routine, but at the start, before tearing down the DMA
mapping of the received packet.
Remove the comment about the descriptor being zeroed as it is no longer
correct. This is OK because we either use the descriptor as-is (when
recycling) or overwrite its address and size fields with that of the
newly allocated Rx buffer.
The issue was discovered when a process ("perf") consumed too much
memory and put the system under memory pressure. It can be reproduced by
injecting slab allocation failures [1]. After the fix, the Rx queue no
longer comes to a halt.
[1]
# echo 10 > /sys/kernel/debug/failslab/times
# echo 1000 > /sys/kernel/debug/failslab/interval
# echo 100 > /sys/kernel/debug/failslab/probability
FAULT_INJECTION: forcing a failure.
name failslab, interval 1000, probability 100, space 0, times 8
[...]
Call Trace:
<IRQ>
dump_stack_lvl+0x34/0x44
should_fail.cold+0x32/0x37
should_failslab+0x5/0x10
kmem_cache_alloc_node+0x23/0x190
__alloc_skb+0x1f9/0x280
__netdev_alloc_skb+0x3a/0x150
mlxsw_pci_rdq_skb_alloc+0x24/0x90
mlxsw_pci_cq_tasklet+0x3dc/0x1200
tasklet_action_common.constprop.0+0x9f/0x100
__do_softirq+0xb5/0x252
irq_exit_rcu+0x7a/0xa0
common_interrupt+0x83/0xa0
</IRQ>
asm_common_interrupt+0x1e/0x40
RIP: 0010:cpuidle_enter_state+0xc8/0x340
[...]
mlxsw_spectrum2 0000:06:00.0: Failed to alloc skb for RDQ
Fixes: eda6500a987a ("mlxsw: Add PCI bus implementation")
Signed-off-by: Ido Schimmel <idosch(a)nvidia.com>
Reviewed-by: Petr Machata <petrm(a)nvidia.com>
Link: https://lore.kernel.org/r/20211024064014.1060919-1-idosch@idosch.org
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/drivers/net/ethernet/mellanox/mlxsw/pci.c b/drivers/net/ethernet/mellanox/mlxsw/pci.c
index 13b0259f7ea6..fcace73eae40 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/pci.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/pci.c
@@ -353,13 +353,10 @@ static int mlxsw_pci_rdq_skb_alloc(struct mlxsw_pci *mlxsw_pci,
struct sk_buff *skb;
int err;
- elem_info->u.rdq.skb = NULL;
skb = netdev_alloc_skb_ip_align(NULL, buf_len);
if (!skb)
return -ENOMEM;
- /* Assume that wqe was previously zeroed. */
-
err = mlxsw_pci_wqe_frag_map(mlxsw_pci, wqe, 0, skb->data,
buf_len, DMA_FROM_DEVICE);
if (err)
@@ -597,21 +594,26 @@ static void mlxsw_pci_cqe_rdq_handle(struct mlxsw_pci *mlxsw_pci,
struct pci_dev *pdev = mlxsw_pci->pdev;
struct mlxsw_pci_queue_elem_info *elem_info;
struct mlxsw_rx_info rx_info = {};
- char *wqe;
+ char wqe[MLXSW_PCI_WQE_SIZE];
struct sk_buff *skb;
u16 byte_count;
int err;
elem_info = mlxsw_pci_queue_elem_info_consumer_get(q);
- skb = elem_info->u.sdq.skb;
- if (!skb)
- return;
- wqe = elem_info->elem;
- mlxsw_pci_wqe_frag_unmap(mlxsw_pci, wqe, 0, DMA_FROM_DEVICE);
+ skb = elem_info->u.rdq.skb;
+ memcpy(wqe, elem_info->elem, MLXSW_PCI_WQE_SIZE);
if (q->consumer_counter++ != consumer_counter_limit)
dev_dbg_ratelimited(&pdev->dev, "Consumer counter does not match limit in RDQ\n");
+ err = mlxsw_pci_rdq_skb_alloc(mlxsw_pci, elem_info);
+ if (err) {
+ dev_err_ratelimited(&pdev->dev, "Failed to alloc skb for RDQ\n");
+ goto out;
+ }
+
+ mlxsw_pci_wqe_frag_unmap(mlxsw_pci, wqe, 0, DMA_FROM_DEVICE);
+
if (mlxsw_pci_cqe_lag_get(cqe_v, cqe)) {
rx_info.is_lag = true;
rx_info.u.lag_id = mlxsw_pci_cqe_lag_id_get(cqe_v, cqe);
@@ -647,10 +649,7 @@ static void mlxsw_pci_cqe_rdq_handle(struct mlxsw_pci *mlxsw_pci,
skb_put(skb, byte_count);
mlxsw_core_skb_receive(mlxsw_pci->core, skb, &rx_info);
- memset(wqe, 0, q->elem_size);
- err = mlxsw_pci_rdq_skb_alloc(mlxsw_pci, elem_info);
- if (err)
- dev_dbg_ratelimited(&pdev->dev, "Failed to alloc skb for RDQ\n");
+out:
/* Everything is set up, ring doorbell to pass elem to HW */
q->producer_counter++;
mlxsw_pci_queue_doorbell_producer_ring(mlxsw_pci, q);
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 689a0a9f505f7bffdefe6f17fddb41c8ab6344f6 Mon Sep 17 00:00:00 2001
From: Janusz Dziedzic <janusz.dziedzic(a)gmail.com>
Date: Sun, 24 Oct 2021 22:15:46 +0200
Subject: [PATCH] cfg80211: correct bridge/4addr mode check
Without the patch we fail:
$ sudo brctl addbr br0
$ sudo brctl addif br0 wlp1s0
$ sudo iw wlp1s0 set 4addr on
command failed: Device or resource busy (-16)
Last command failed but iface was already in 4addr mode.
Fixes: ad4bb6f8883a ("cfg80211: disallow bridging managed/adhoc interfaces")
Signed-off-by: Janusz Dziedzic <janusz.dziedzic(a)gmail.com>
Link: https://lore.kernel.org/r/20211024201546.614379-1-janusz.dziedzic@gmail.com
[add fixes tag, fix indentation, edit commit log]
Signed-off-by: Johannes Berg <johannes.berg(a)intel.com>
diff --git a/net/wireless/util.c b/net/wireless/util.c
index 18dba3d7c638..a1a99a574984 100644
--- a/net/wireless/util.c
+++ b/net/wireless/util.c
@@ -1028,14 +1028,14 @@ int cfg80211_change_iface(struct cfg80211_registered_device *rdev,
!(rdev->wiphy.interface_modes & (1 << ntype)))
return -EOPNOTSUPP;
- /* if it's part of a bridge, reject changing type to station/ibss */
- if (netif_is_bridge_port(dev) &&
- (ntype == NL80211_IFTYPE_ADHOC ||
- ntype == NL80211_IFTYPE_STATION ||
- ntype == NL80211_IFTYPE_P2P_CLIENT))
- return -EBUSY;
-
if (ntype != otype) {
+ /* if it's part of a bridge, reject changing type to station/ibss */
+ if (netif_is_bridge_port(dev) &&
+ (ntype == NL80211_IFTYPE_ADHOC ||
+ ntype == NL80211_IFTYPE_STATION ||
+ ntype == NL80211_IFTYPE_P2P_CLIENT))
+ return -EBUSY;
+
dev->ieee80211_ptr->use_4addr = false;
dev->ieee80211_ptr->mesh_id_up_len = 0;
wdev_lock(dev->ieee80211_ptr);
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 689a0a9f505f7bffdefe6f17fddb41c8ab6344f6 Mon Sep 17 00:00:00 2001
From: Janusz Dziedzic <janusz.dziedzic(a)gmail.com>
Date: Sun, 24 Oct 2021 22:15:46 +0200
Subject: [PATCH] cfg80211: correct bridge/4addr mode check
Without the patch we fail:
$ sudo brctl addbr br0
$ sudo brctl addif br0 wlp1s0
$ sudo iw wlp1s0 set 4addr on
command failed: Device or resource busy (-16)
Last command failed but iface was already in 4addr mode.
Fixes: ad4bb6f8883a ("cfg80211: disallow bridging managed/adhoc interfaces")
Signed-off-by: Janusz Dziedzic <janusz.dziedzic(a)gmail.com>
Link: https://lore.kernel.org/r/20211024201546.614379-1-janusz.dziedzic@gmail.com
[add fixes tag, fix indentation, edit commit log]
Signed-off-by: Johannes Berg <johannes.berg(a)intel.com>
diff --git a/net/wireless/util.c b/net/wireless/util.c
index 18dba3d7c638..a1a99a574984 100644
--- a/net/wireless/util.c
+++ b/net/wireless/util.c
@@ -1028,14 +1028,14 @@ int cfg80211_change_iface(struct cfg80211_registered_device *rdev,
!(rdev->wiphy.interface_modes & (1 << ntype)))
return -EOPNOTSUPP;
- /* if it's part of a bridge, reject changing type to station/ibss */
- if (netif_is_bridge_port(dev) &&
- (ntype == NL80211_IFTYPE_ADHOC ||
- ntype == NL80211_IFTYPE_STATION ||
- ntype == NL80211_IFTYPE_P2P_CLIENT))
- return -EBUSY;
-
if (ntype != otype) {
+ /* if it's part of a bridge, reject changing type to station/ibss */
+ if (netif_is_bridge_port(dev) &&
+ (ntype == NL80211_IFTYPE_ADHOC ||
+ ntype == NL80211_IFTYPE_STATION ||
+ ntype == NL80211_IFTYPE_P2P_CLIENT))
+ return -EBUSY;
+
dev->ieee80211_ptr->use_4addr = false;
dev->ieee80211_ptr->mesh_id_up_len = 0;
wdev_lock(dev->ieee80211_ptr);
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 689a0a9f505f7bffdefe6f17fddb41c8ab6344f6 Mon Sep 17 00:00:00 2001
From: Janusz Dziedzic <janusz.dziedzic(a)gmail.com>
Date: Sun, 24 Oct 2021 22:15:46 +0200
Subject: [PATCH] cfg80211: correct bridge/4addr mode check
Without the patch we fail:
$ sudo brctl addbr br0
$ sudo brctl addif br0 wlp1s0
$ sudo iw wlp1s0 set 4addr on
command failed: Device or resource busy (-16)
Last command failed but iface was already in 4addr mode.
Fixes: ad4bb6f8883a ("cfg80211: disallow bridging managed/adhoc interfaces")
Signed-off-by: Janusz Dziedzic <janusz.dziedzic(a)gmail.com>
Link: https://lore.kernel.org/r/20211024201546.614379-1-janusz.dziedzic@gmail.com
[add fixes tag, fix indentation, edit commit log]
Signed-off-by: Johannes Berg <johannes.berg(a)intel.com>
diff --git a/net/wireless/util.c b/net/wireless/util.c
index 18dba3d7c638..a1a99a574984 100644
--- a/net/wireless/util.c
+++ b/net/wireless/util.c
@@ -1028,14 +1028,14 @@ int cfg80211_change_iface(struct cfg80211_registered_device *rdev,
!(rdev->wiphy.interface_modes & (1 << ntype)))
return -EOPNOTSUPP;
- /* if it's part of a bridge, reject changing type to station/ibss */
- if (netif_is_bridge_port(dev) &&
- (ntype == NL80211_IFTYPE_ADHOC ||
- ntype == NL80211_IFTYPE_STATION ||
- ntype == NL80211_IFTYPE_P2P_CLIENT))
- return -EBUSY;
-
if (ntype != otype) {
+ /* if it's part of a bridge, reject changing type to station/ibss */
+ if (netif_is_bridge_port(dev) &&
+ (ntype == NL80211_IFTYPE_ADHOC ||
+ ntype == NL80211_IFTYPE_STATION ||
+ ntype == NL80211_IFTYPE_P2P_CLIENT))
+ return -EBUSY;
+
dev->ieee80211_ptr->use_4addr = false;
dev->ieee80211_ptr->mesh_id_up_len = 0;
wdev_lock(dev->ieee80211_ptr);
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 689a0a9f505f7bffdefe6f17fddb41c8ab6344f6 Mon Sep 17 00:00:00 2001
From: Janusz Dziedzic <janusz.dziedzic(a)gmail.com>
Date: Sun, 24 Oct 2021 22:15:46 +0200
Subject: [PATCH] cfg80211: correct bridge/4addr mode check
Without the patch we fail:
$ sudo brctl addbr br0
$ sudo brctl addif br0 wlp1s0
$ sudo iw wlp1s0 set 4addr on
command failed: Device or resource busy (-16)
Last command failed but iface was already in 4addr mode.
Fixes: ad4bb6f8883a ("cfg80211: disallow bridging managed/adhoc interfaces")
Signed-off-by: Janusz Dziedzic <janusz.dziedzic(a)gmail.com>
Link: https://lore.kernel.org/r/20211024201546.614379-1-janusz.dziedzic@gmail.com
[add fixes tag, fix indentation, edit commit log]
Signed-off-by: Johannes Berg <johannes.berg(a)intel.com>
diff --git a/net/wireless/util.c b/net/wireless/util.c
index 18dba3d7c638..a1a99a574984 100644
--- a/net/wireless/util.c
+++ b/net/wireless/util.c
@@ -1028,14 +1028,14 @@ int cfg80211_change_iface(struct cfg80211_registered_device *rdev,
!(rdev->wiphy.interface_modes & (1 << ntype)))
return -EOPNOTSUPP;
- /* if it's part of a bridge, reject changing type to station/ibss */
- if (netif_is_bridge_port(dev) &&
- (ntype == NL80211_IFTYPE_ADHOC ||
- ntype == NL80211_IFTYPE_STATION ||
- ntype == NL80211_IFTYPE_P2P_CLIENT))
- return -EBUSY;
-
if (ntype != otype) {
+ /* if it's part of a bridge, reject changing type to station/ibss */
+ if (netif_is_bridge_port(dev) &&
+ (ntype == NL80211_IFTYPE_ADHOC ||
+ ntype == NL80211_IFTYPE_STATION ||
+ ntype == NL80211_IFTYPE_P2P_CLIENT))
+ return -EBUSY;
+
dev->ieee80211_ptr->use_4addr = false;
dev->ieee80211_ptr->mesh_id_up_len = 0;
wdev_lock(dev->ieee80211_ptr);
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 689a0a9f505f7bffdefe6f17fddb41c8ab6344f6 Mon Sep 17 00:00:00 2001
From: Janusz Dziedzic <janusz.dziedzic(a)gmail.com>
Date: Sun, 24 Oct 2021 22:15:46 +0200
Subject: [PATCH] cfg80211: correct bridge/4addr mode check
Without the patch we fail:
$ sudo brctl addbr br0
$ sudo brctl addif br0 wlp1s0
$ sudo iw wlp1s0 set 4addr on
command failed: Device or resource busy (-16)
Last command failed but iface was already in 4addr mode.
Fixes: ad4bb6f8883a ("cfg80211: disallow bridging managed/adhoc interfaces")
Signed-off-by: Janusz Dziedzic <janusz.dziedzic(a)gmail.com>
Link: https://lore.kernel.org/r/20211024201546.614379-1-janusz.dziedzic@gmail.com
[add fixes tag, fix indentation, edit commit log]
Signed-off-by: Johannes Berg <johannes.berg(a)intel.com>
diff --git a/net/wireless/util.c b/net/wireless/util.c
index 18dba3d7c638..a1a99a574984 100644
--- a/net/wireless/util.c
+++ b/net/wireless/util.c
@@ -1028,14 +1028,14 @@ int cfg80211_change_iface(struct cfg80211_registered_device *rdev,
!(rdev->wiphy.interface_modes & (1 << ntype)))
return -EOPNOTSUPP;
- /* if it's part of a bridge, reject changing type to station/ibss */
- if (netif_is_bridge_port(dev) &&
- (ntype == NL80211_IFTYPE_ADHOC ||
- ntype == NL80211_IFTYPE_STATION ||
- ntype == NL80211_IFTYPE_P2P_CLIENT))
- return -EBUSY;
-
if (ntype != otype) {
+ /* if it's part of a bridge, reject changing type to station/ibss */
+ if (netif_is_bridge_port(dev) &&
+ (ntype == NL80211_IFTYPE_ADHOC ||
+ ntype == NL80211_IFTYPE_STATION ||
+ ntype == NL80211_IFTYPE_P2P_CLIENT))
+ return -EBUSY;
+
dev->ieee80211_ptr->use_4addr = false;
dev->ieee80211_ptr->mesh_id_up_len = 0;
wdev_lock(dev->ieee80211_ptr);
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 6f68cd634856f8ca93bafd623ba5357e0f648c68 Mon Sep 17 00:00:00 2001
From: Pavel Skripkin <paskripkin(a)gmail.com>
Date: Sun, 24 Oct 2021 16:13:56 +0300
Subject: [PATCH] net: batman-adv: fix error handling
Syzbot reported ODEBUG warning in batadv_nc_mesh_free(). The problem was
in wrong error handling in batadv_mesh_init().
Before this patch batadv_mesh_init() was calling batadv_mesh_free() in case
of any batadv_*_init() calls failure. This approach may work well, when
there is some kind of indicator, which can tell which parts of batadv are
initialized; but there isn't any.
All written above lead to cleaning up uninitialized fields. Even if we hide
ODEBUG warning by initializing bat_priv->nc.work, syzbot was able to hit
GPF in batadv_nc_purge_paths(), because hash pointer in still NULL. [1]
To fix these bugs we can unwind batadv_*_init() calls one by one.
It is good approach for 2 reasons: 1) It fixes bugs on error handling
path 2) It improves the performance, since we won't call unneeded
batadv_*_free() functions.
So, this patch makes all batadv_*_init() clean up all allocated memory
before returning with an error to no call correspoing batadv_*_free()
and open-codes batadv_mesh_free() with proper order to avoid touching
uninitialized fields.
Link: https://lore.kernel.org/netdev/000000000000c87fbd05cef6bcb0@google.com/ [1]
Reported-and-tested-by: syzbot+28b0702ada0bf7381f58(a)syzkaller.appspotmail.com
Fixes: c6c8fea29769 ("net: Add batman-adv meshing protocol")
Signed-off-by: Pavel Skripkin <paskripkin(a)gmail.com>
Acked-by: Sven Eckelmann <sven(a)narfation.org>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
diff --git a/net/batman-adv/bridge_loop_avoidance.c b/net/batman-adv/bridge_loop_avoidance.c
index 1669744304c5..17687848daec 100644
--- a/net/batman-adv/bridge_loop_avoidance.c
+++ b/net/batman-adv/bridge_loop_avoidance.c
@@ -1560,10 +1560,14 @@ int batadv_bla_init(struct batadv_priv *bat_priv)
return 0;
bat_priv->bla.claim_hash = batadv_hash_new(128);
- bat_priv->bla.backbone_hash = batadv_hash_new(32);
+ if (!bat_priv->bla.claim_hash)
+ return -ENOMEM;
- if (!bat_priv->bla.claim_hash || !bat_priv->bla.backbone_hash)
+ bat_priv->bla.backbone_hash = batadv_hash_new(32);
+ if (!bat_priv->bla.backbone_hash) {
+ batadv_hash_destroy(bat_priv->bla.claim_hash);
return -ENOMEM;
+ }
batadv_hash_set_lock_class(bat_priv->bla.claim_hash,
&batadv_claim_hash_lock_class_key);
diff --git a/net/batman-adv/main.c b/net/batman-adv/main.c
index 3ddd66e4c29e..5207cd8d6ad8 100644
--- a/net/batman-adv/main.c
+++ b/net/batman-adv/main.c
@@ -190,29 +190,41 @@ int batadv_mesh_init(struct net_device *soft_iface)
bat_priv->gw.generation = 0;
- ret = batadv_v_mesh_init(bat_priv);
- if (ret < 0)
- goto err;
-
ret = batadv_originator_init(bat_priv);
- if (ret < 0)
- goto err;
+ if (ret < 0) {
+ atomic_set(&bat_priv->mesh_state, BATADV_MESH_DEACTIVATING);
+ goto err_orig;
+ }
ret = batadv_tt_init(bat_priv);
- if (ret < 0)
- goto err;
+ if (ret < 0) {
+ atomic_set(&bat_priv->mesh_state, BATADV_MESH_DEACTIVATING);
+ goto err_tt;
+ }
+
+ ret = batadv_v_mesh_init(bat_priv);
+ if (ret < 0) {
+ atomic_set(&bat_priv->mesh_state, BATADV_MESH_DEACTIVATING);
+ goto err_v;
+ }
ret = batadv_bla_init(bat_priv);
- if (ret < 0)
- goto err;
+ if (ret < 0) {
+ atomic_set(&bat_priv->mesh_state, BATADV_MESH_DEACTIVATING);
+ goto err_bla;
+ }
ret = batadv_dat_init(bat_priv);
- if (ret < 0)
- goto err;
+ if (ret < 0) {
+ atomic_set(&bat_priv->mesh_state, BATADV_MESH_DEACTIVATING);
+ goto err_dat;
+ }
ret = batadv_nc_mesh_init(bat_priv);
- if (ret < 0)
- goto err;
+ if (ret < 0) {
+ atomic_set(&bat_priv->mesh_state, BATADV_MESH_DEACTIVATING);
+ goto err_nc;
+ }
batadv_gw_init(bat_priv);
batadv_mcast_init(bat_priv);
@@ -222,8 +234,20 @@ int batadv_mesh_init(struct net_device *soft_iface)
return 0;
-err:
- batadv_mesh_free(soft_iface);
+err_nc:
+ batadv_dat_free(bat_priv);
+err_dat:
+ batadv_bla_free(bat_priv);
+err_bla:
+ batadv_v_mesh_free(bat_priv);
+err_v:
+ batadv_tt_free(bat_priv);
+err_tt:
+ batadv_originator_free(bat_priv);
+err_orig:
+ batadv_purge_outstanding_packets(bat_priv, NULL);
+ atomic_set(&bat_priv->mesh_state, BATADV_MESH_INACTIVE);
+
return ret;
}
diff --git a/net/batman-adv/network-coding.c b/net/batman-adv/network-coding.c
index 9f06132e007d..0a7f1d36a6a8 100644
--- a/net/batman-adv/network-coding.c
+++ b/net/batman-adv/network-coding.c
@@ -152,8 +152,10 @@ int batadv_nc_mesh_init(struct batadv_priv *bat_priv)
&batadv_nc_coding_hash_lock_class_key);
bat_priv->nc.decoding_hash = batadv_hash_new(128);
- if (!bat_priv->nc.decoding_hash)
+ if (!bat_priv->nc.decoding_hash) {
+ batadv_hash_destroy(bat_priv->nc.coding_hash);
goto err;
+ }
batadv_hash_set_lock_class(bat_priv->nc.decoding_hash,
&batadv_nc_decoding_hash_lock_class_key);
diff --git a/net/batman-adv/translation-table.c b/net/batman-adv/translation-table.c
index e0b3dace2020..4b7ad6684bc4 100644
--- a/net/batman-adv/translation-table.c
+++ b/net/batman-adv/translation-table.c
@@ -4162,8 +4162,10 @@ int batadv_tt_init(struct batadv_priv *bat_priv)
return ret;
ret = batadv_tt_global_init(bat_priv);
- if (ret < 0)
+ if (ret < 0) {
+ batadv_tt_local_table_free(bat_priv);
return ret;
+ }
batadv_tvlv_handler_register(bat_priv, batadv_tt_tvlv_ogm_handler_v1,
batadv_tt_tvlv_unicast_handler_v1,
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 54713c85f536048e685258f880bf298a74c3620d Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Toke=20H=C3=B8iland-J=C3=B8rgensen?= <toke(a)redhat.com>
Date: Tue, 26 Oct 2021 13:00:19 +0200
Subject: [PATCH] bpf: Fix potential race in tail call compatibility check
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Lorenzo noticed that the code testing for program type compatibility of
tail call maps is potentially racy in that two threads could encounter a
map with an unset type simultaneously and both return true even though they
are inserting incompatible programs.
The race window is quite small, but artificially enlarging it by adding a
usleep_range() inside the check in bpf_prog_array_compatible() makes it
trivial to trigger from userspace with a program that does, essentially:
map_fd = bpf_create_map(BPF_MAP_TYPE_PROG_ARRAY, 4, 4, 2, 0);
pid = fork();
if (pid) {
key = 0;
value = xdp_fd;
} else {
key = 1;
value = tc_fd;
}
err = bpf_map_update_elem(map_fd, &key, &value, 0);
While the race window is small, it has potentially serious ramifications in
that triggering it would allow a BPF program to tail call to a program of a
different type. So let's get rid of it by protecting the update with a
spinlock. The commit in the Fixes tag is the last commit that touches the
code in question.
v2:
- Use a spinlock instead of an atomic variable and cmpxchg() (Alexei)
v3:
- Put lock and the members it protects into an embedded 'owner' struct (Daniel)
Fixes: 3324b584b6f6 ("ebpf: misc core cleanup")
Reported-by: Lorenzo Bianconi <lorenzo.bianconi(a)redhat.com>
Signed-off-by: Toke Høiland-Jørgensen <toke(a)redhat.com>
Signed-off-by: Alexei Starovoitov <ast(a)kernel.org>
Link: https://lore.kernel.org/bpf/20211026110019.363464-1-toke@redhat.com
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 020a7d5bf470..3db6f6c95489 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -929,8 +929,11 @@ struct bpf_array_aux {
* stored in the map to make sure that all callers and callees have
* the same prog type and JITed flag.
*/
- enum bpf_prog_type type;
- bool jited;
+ struct {
+ spinlock_t lock;
+ enum bpf_prog_type type;
+ bool jited;
+ } owner;
/* Programs with direct jumps into programs part of this array. */
struct list_head poke_progs;
struct bpf_map *map;
diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index cebd4fb06d19..447def540544 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -1072,6 +1072,7 @@ static struct bpf_map *prog_array_map_alloc(union bpf_attr *attr)
INIT_WORK(&aux->work, prog_array_map_clear_deferred);
INIT_LIST_HEAD(&aux->poke_progs);
mutex_init(&aux->poke_mutex);
+ spin_lock_init(&aux->owner.lock);
map = array_map_alloc(attr);
if (IS_ERR(map)) {
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index c1e7eb3f1876..6e3ae90ad107 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -1823,20 +1823,26 @@ static unsigned int __bpf_prog_ret0_warn(const void *ctx,
bool bpf_prog_array_compatible(struct bpf_array *array,
const struct bpf_prog *fp)
{
+ bool ret;
+
if (fp->kprobe_override)
return false;
- if (!array->aux->type) {
+ spin_lock(&array->aux->owner.lock);
+
+ if (!array->aux->owner.type) {
/* There's no owner yet where we could check for
* compatibility.
*/
- array->aux->type = fp->type;
- array->aux->jited = fp->jited;
- return true;
+ array->aux->owner.type = fp->type;
+ array->aux->owner.jited = fp->jited;
+ ret = true;
+ } else {
+ ret = array->aux->owner.type == fp->type &&
+ array->aux->owner.jited == fp->jited;
}
-
- return array->aux->type == fp->type &&
- array->aux->jited == fp->jited;
+ spin_unlock(&array->aux->owner.lock);
+ return ret;
}
static int bpf_check_tail_call(const struct bpf_prog *fp)
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 9dab49d3f394..1cad6979a0d0 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -543,8 +543,10 @@ static void bpf_map_show_fdinfo(struct seq_file *m, struct file *filp)
if (map->map_type == BPF_MAP_TYPE_PROG_ARRAY) {
array = container_of(map, struct bpf_array, map);
- type = array->aux->type;
- jited = array->aux->jited;
+ spin_lock(&array->aux->owner.lock);
+ type = array->aux->owner.type;
+ jited = array->aux->owner.jited;
+ spin_unlock(&array->aux->owner.lock);
}
seq_printf(m,
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 54713c85f536048e685258f880bf298a74c3620d Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Toke=20H=C3=B8iland-J=C3=B8rgensen?= <toke(a)redhat.com>
Date: Tue, 26 Oct 2021 13:00:19 +0200
Subject: [PATCH] bpf: Fix potential race in tail call compatibility check
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Lorenzo noticed that the code testing for program type compatibility of
tail call maps is potentially racy in that two threads could encounter a
map with an unset type simultaneously and both return true even though they
are inserting incompatible programs.
The race window is quite small, but artificially enlarging it by adding a
usleep_range() inside the check in bpf_prog_array_compatible() makes it
trivial to trigger from userspace with a program that does, essentially:
map_fd = bpf_create_map(BPF_MAP_TYPE_PROG_ARRAY, 4, 4, 2, 0);
pid = fork();
if (pid) {
key = 0;
value = xdp_fd;
} else {
key = 1;
value = tc_fd;
}
err = bpf_map_update_elem(map_fd, &key, &value, 0);
While the race window is small, it has potentially serious ramifications in
that triggering it would allow a BPF program to tail call to a program of a
different type. So let's get rid of it by protecting the update with a
spinlock. The commit in the Fixes tag is the last commit that touches the
code in question.
v2:
- Use a spinlock instead of an atomic variable and cmpxchg() (Alexei)
v3:
- Put lock and the members it protects into an embedded 'owner' struct (Daniel)
Fixes: 3324b584b6f6 ("ebpf: misc core cleanup")
Reported-by: Lorenzo Bianconi <lorenzo.bianconi(a)redhat.com>
Signed-off-by: Toke Høiland-Jørgensen <toke(a)redhat.com>
Signed-off-by: Alexei Starovoitov <ast(a)kernel.org>
Link: https://lore.kernel.org/bpf/20211026110019.363464-1-toke@redhat.com
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 020a7d5bf470..3db6f6c95489 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -929,8 +929,11 @@ struct bpf_array_aux {
* stored in the map to make sure that all callers and callees have
* the same prog type and JITed flag.
*/
- enum bpf_prog_type type;
- bool jited;
+ struct {
+ spinlock_t lock;
+ enum bpf_prog_type type;
+ bool jited;
+ } owner;
/* Programs with direct jumps into programs part of this array. */
struct list_head poke_progs;
struct bpf_map *map;
diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index cebd4fb06d19..447def540544 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -1072,6 +1072,7 @@ static struct bpf_map *prog_array_map_alloc(union bpf_attr *attr)
INIT_WORK(&aux->work, prog_array_map_clear_deferred);
INIT_LIST_HEAD(&aux->poke_progs);
mutex_init(&aux->poke_mutex);
+ spin_lock_init(&aux->owner.lock);
map = array_map_alloc(attr);
if (IS_ERR(map)) {
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index c1e7eb3f1876..6e3ae90ad107 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -1823,20 +1823,26 @@ static unsigned int __bpf_prog_ret0_warn(const void *ctx,
bool bpf_prog_array_compatible(struct bpf_array *array,
const struct bpf_prog *fp)
{
+ bool ret;
+
if (fp->kprobe_override)
return false;
- if (!array->aux->type) {
+ spin_lock(&array->aux->owner.lock);
+
+ if (!array->aux->owner.type) {
/* There's no owner yet where we could check for
* compatibility.
*/
- array->aux->type = fp->type;
- array->aux->jited = fp->jited;
- return true;
+ array->aux->owner.type = fp->type;
+ array->aux->owner.jited = fp->jited;
+ ret = true;
+ } else {
+ ret = array->aux->owner.type == fp->type &&
+ array->aux->owner.jited == fp->jited;
}
-
- return array->aux->type == fp->type &&
- array->aux->jited == fp->jited;
+ spin_unlock(&array->aux->owner.lock);
+ return ret;
}
static int bpf_check_tail_call(const struct bpf_prog *fp)
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 9dab49d3f394..1cad6979a0d0 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -543,8 +543,10 @@ static void bpf_map_show_fdinfo(struct seq_file *m, struct file *filp)
if (map->map_type == BPF_MAP_TYPE_PROG_ARRAY) {
array = container_of(map, struct bpf_array, map);
- type = array->aux->type;
- jited = array->aux->jited;
+ spin_lock(&array->aux->owner.lock);
+ type = array->aux->owner.type;
+ jited = array->aux->owner.jited;
+ spin_unlock(&array->aux->owner.lock);
}
seq_printf(m,
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 54713c85f536048e685258f880bf298a74c3620d Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Toke=20H=C3=B8iland-J=C3=B8rgensen?= <toke(a)redhat.com>
Date: Tue, 26 Oct 2021 13:00:19 +0200
Subject: [PATCH] bpf: Fix potential race in tail call compatibility check
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Lorenzo noticed that the code testing for program type compatibility of
tail call maps is potentially racy in that two threads could encounter a
map with an unset type simultaneously and both return true even though they
are inserting incompatible programs.
The race window is quite small, but artificially enlarging it by adding a
usleep_range() inside the check in bpf_prog_array_compatible() makes it
trivial to trigger from userspace with a program that does, essentially:
map_fd = bpf_create_map(BPF_MAP_TYPE_PROG_ARRAY, 4, 4, 2, 0);
pid = fork();
if (pid) {
key = 0;
value = xdp_fd;
} else {
key = 1;
value = tc_fd;
}
err = bpf_map_update_elem(map_fd, &key, &value, 0);
While the race window is small, it has potentially serious ramifications in
that triggering it would allow a BPF program to tail call to a program of a
different type. So let's get rid of it by protecting the update with a
spinlock. The commit in the Fixes tag is the last commit that touches the
code in question.
v2:
- Use a spinlock instead of an atomic variable and cmpxchg() (Alexei)
v3:
- Put lock and the members it protects into an embedded 'owner' struct (Daniel)
Fixes: 3324b584b6f6 ("ebpf: misc core cleanup")
Reported-by: Lorenzo Bianconi <lorenzo.bianconi(a)redhat.com>
Signed-off-by: Toke Høiland-Jørgensen <toke(a)redhat.com>
Signed-off-by: Alexei Starovoitov <ast(a)kernel.org>
Link: https://lore.kernel.org/bpf/20211026110019.363464-1-toke@redhat.com
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 020a7d5bf470..3db6f6c95489 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -929,8 +929,11 @@ struct bpf_array_aux {
* stored in the map to make sure that all callers and callees have
* the same prog type and JITed flag.
*/
- enum bpf_prog_type type;
- bool jited;
+ struct {
+ spinlock_t lock;
+ enum bpf_prog_type type;
+ bool jited;
+ } owner;
/* Programs with direct jumps into programs part of this array. */
struct list_head poke_progs;
struct bpf_map *map;
diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index cebd4fb06d19..447def540544 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -1072,6 +1072,7 @@ static struct bpf_map *prog_array_map_alloc(union bpf_attr *attr)
INIT_WORK(&aux->work, prog_array_map_clear_deferred);
INIT_LIST_HEAD(&aux->poke_progs);
mutex_init(&aux->poke_mutex);
+ spin_lock_init(&aux->owner.lock);
map = array_map_alloc(attr);
if (IS_ERR(map)) {
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index c1e7eb3f1876..6e3ae90ad107 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -1823,20 +1823,26 @@ static unsigned int __bpf_prog_ret0_warn(const void *ctx,
bool bpf_prog_array_compatible(struct bpf_array *array,
const struct bpf_prog *fp)
{
+ bool ret;
+
if (fp->kprobe_override)
return false;
- if (!array->aux->type) {
+ spin_lock(&array->aux->owner.lock);
+
+ if (!array->aux->owner.type) {
/* There's no owner yet where we could check for
* compatibility.
*/
- array->aux->type = fp->type;
- array->aux->jited = fp->jited;
- return true;
+ array->aux->owner.type = fp->type;
+ array->aux->owner.jited = fp->jited;
+ ret = true;
+ } else {
+ ret = array->aux->owner.type == fp->type &&
+ array->aux->owner.jited == fp->jited;
}
-
- return array->aux->type == fp->type &&
- array->aux->jited == fp->jited;
+ spin_unlock(&array->aux->owner.lock);
+ return ret;
}
static int bpf_check_tail_call(const struct bpf_prog *fp)
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 9dab49d3f394..1cad6979a0d0 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -543,8 +543,10 @@ static void bpf_map_show_fdinfo(struct seq_file *m, struct file *filp)
if (map->map_type == BPF_MAP_TYPE_PROG_ARRAY) {
array = container_of(map, struct bpf_array, map);
- type = array->aux->type;
- jited = array->aux->jited;
+ spin_lock(&array->aux->owner.lock);
+ type = array->aux->owner.type;
+ jited = array->aux->owner.jited;
+ spin_unlock(&array->aux->owner.lock);
}
seq_printf(m,
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 54713c85f536048e685258f880bf298a74c3620d Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Toke=20H=C3=B8iland-J=C3=B8rgensen?= <toke(a)redhat.com>
Date: Tue, 26 Oct 2021 13:00:19 +0200
Subject: [PATCH] bpf: Fix potential race in tail call compatibility check
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Lorenzo noticed that the code testing for program type compatibility of
tail call maps is potentially racy in that two threads could encounter a
map with an unset type simultaneously and both return true even though they
are inserting incompatible programs.
The race window is quite small, but artificially enlarging it by adding a
usleep_range() inside the check in bpf_prog_array_compatible() makes it
trivial to trigger from userspace with a program that does, essentially:
map_fd = bpf_create_map(BPF_MAP_TYPE_PROG_ARRAY, 4, 4, 2, 0);
pid = fork();
if (pid) {
key = 0;
value = xdp_fd;
} else {
key = 1;
value = tc_fd;
}
err = bpf_map_update_elem(map_fd, &key, &value, 0);
While the race window is small, it has potentially serious ramifications in
that triggering it would allow a BPF program to tail call to a program of a
different type. So let's get rid of it by protecting the update with a
spinlock. The commit in the Fixes tag is the last commit that touches the
code in question.
v2:
- Use a spinlock instead of an atomic variable and cmpxchg() (Alexei)
v3:
- Put lock and the members it protects into an embedded 'owner' struct (Daniel)
Fixes: 3324b584b6f6 ("ebpf: misc core cleanup")
Reported-by: Lorenzo Bianconi <lorenzo.bianconi(a)redhat.com>
Signed-off-by: Toke Høiland-Jørgensen <toke(a)redhat.com>
Signed-off-by: Alexei Starovoitov <ast(a)kernel.org>
Link: https://lore.kernel.org/bpf/20211026110019.363464-1-toke@redhat.com
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 020a7d5bf470..3db6f6c95489 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -929,8 +929,11 @@ struct bpf_array_aux {
* stored in the map to make sure that all callers and callees have
* the same prog type and JITed flag.
*/
- enum bpf_prog_type type;
- bool jited;
+ struct {
+ spinlock_t lock;
+ enum bpf_prog_type type;
+ bool jited;
+ } owner;
/* Programs with direct jumps into programs part of this array. */
struct list_head poke_progs;
struct bpf_map *map;
diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index cebd4fb06d19..447def540544 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -1072,6 +1072,7 @@ static struct bpf_map *prog_array_map_alloc(union bpf_attr *attr)
INIT_WORK(&aux->work, prog_array_map_clear_deferred);
INIT_LIST_HEAD(&aux->poke_progs);
mutex_init(&aux->poke_mutex);
+ spin_lock_init(&aux->owner.lock);
map = array_map_alloc(attr);
if (IS_ERR(map)) {
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index c1e7eb3f1876..6e3ae90ad107 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -1823,20 +1823,26 @@ static unsigned int __bpf_prog_ret0_warn(const void *ctx,
bool bpf_prog_array_compatible(struct bpf_array *array,
const struct bpf_prog *fp)
{
+ bool ret;
+
if (fp->kprobe_override)
return false;
- if (!array->aux->type) {
+ spin_lock(&array->aux->owner.lock);
+
+ if (!array->aux->owner.type) {
/* There's no owner yet where we could check for
* compatibility.
*/
- array->aux->type = fp->type;
- array->aux->jited = fp->jited;
- return true;
+ array->aux->owner.type = fp->type;
+ array->aux->owner.jited = fp->jited;
+ ret = true;
+ } else {
+ ret = array->aux->owner.type == fp->type &&
+ array->aux->owner.jited == fp->jited;
}
-
- return array->aux->type == fp->type &&
- array->aux->jited == fp->jited;
+ spin_unlock(&array->aux->owner.lock);
+ return ret;
}
static int bpf_check_tail_call(const struct bpf_prog *fp)
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 9dab49d3f394..1cad6979a0d0 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -543,8 +543,10 @@ static void bpf_map_show_fdinfo(struct seq_file *m, struct file *filp)
if (map->map_type == BPF_MAP_TYPE_PROG_ARRAY) {
array = container_of(map, struct bpf_array, map);
- type = array->aux->type;
- jited = array->aux->jited;
+ spin_lock(&array->aux->owner.lock);
+ type = array->aux->owner.type;
+ jited = array->aux->owner.jited;
+ spin_unlock(&array->aux->owner.lock);
}
seq_printf(m,
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 54713c85f536048e685258f880bf298a74c3620d Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Toke=20H=C3=B8iland-J=C3=B8rgensen?= <toke(a)redhat.com>
Date: Tue, 26 Oct 2021 13:00:19 +0200
Subject: [PATCH] bpf: Fix potential race in tail call compatibility check
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Lorenzo noticed that the code testing for program type compatibility of
tail call maps is potentially racy in that two threads could encounter a
map with an unset type simultaneously and both return true even though they
are inserting incompatible programs.
The race window is quite small, but artificially enlarging it by adding a
usleep_range() inside the check in bpf_prog_array_compatible() makes it
trivial to trigger from userspace with a program that does, essentially:
map_fd = bpf_create_map(BPF_MAP_TYPE_PROG_ARRAY, 4, 4, 2, 0);
pid = fork();
if (pid) {
key = 0;
value = xdp_fd;
} else {
key = 1;
value = tc_fd;
}
err = bpf_map_update_elem(map_fd, &key, &value, 0);
While the race window is small, it has potentially serious ramifications in
that triggering it would allow a BPF program to tail call to a program of a
different type. So let's get rid of it by protecting the update with a
spinlock. The commit in the Fixes tag is the last commit that touches the
code in question.
v2:
- Use a spinlock instead of an atomic variable and cmpxchg() (Alexei)
v3:
- Put lock and the members it protects into an embedded 'owner' struct (Daniel)
Fixes: 3324b584b6f6 ("ebpf: misc core cleanup")
Reported-by: Lorenzo Bianconi <lorenzo.bianconi(a)redhat.com>
Signed-off-by: Toke Høiland-Jørgensen <toke(a)redhat.com>
Signed-off-by: Alexei Starovoitov <ast(a)kernel.org>
Link: https://lore.kernel.org/bpf/20211026110019.363464-1-toke@redhat.com
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 020a7d5bf470..3db6f6c95489 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -929,8 +929,11 @@ struct bpf_array_aux {
* stored in the map to make sure that all callers and callees have
* the same prog type and JITed flag.
*/
- enum bpf_prog_type type;
- bool jited;
+ struct {
+ spinlock_t lock;
+ enum bpf_prog_type type;
+ bool jited;
+ } owner;
/* Programs with direct jumps into programs part of this array. */
struct list_head poke_progs;
struct bpf_map *map;
diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index cebd4fb06d19..447def540544 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -1072,6 +1072,7 @@ static struct bpf_map *prog_array_map_alloc(union bpf_attr *attr)
INIT_WORK(&aux->work, prog_array_map_clear_deferred);
INIT_LIST_HEAD(&aux->poke_progs);
mutex_init(&aux->poke_mutex);
+ spin_lock_init(&aux->owner.lock);
map = array_map_alloc(attr);
if (IS_ERR(map)) {
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index c1e7eb3f1876..6e3ae90ad107 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -1823,20 +1823,26 @@ static unsigned int __bpf_prog_ret0_warn(const void *ctx,
bool bpf_prog_array_compatible(struct bpf_array *array,
const struct bpf_prog *fp)
{
+ bool ret;
+
if (fp->kprobe_override)
return false;
- if (!array->aux->type) {
+ spin_lock(&array->aux->owner.lock);
+
+ if (!array->aux->owner.type) {
/* There's no owner yet where we could check for
* compatibility.
*/
- array->aux->type = fp->type;
- array->aux->jited = fp->jited;
- return true;
+ array->aux->owner.type = fp->type;
+ array->aux->owner.jited = fp->jited;
+ ret = true;
+ } else {
+ ret = array->aux->owner.type == fp->type &&
+ array->aux->owner.jited == fp->jited;
}
-
- return array->aux->type == fp->type &&
- array->aux->jited == fp->jited;
+ spin_unlock(&array->aux->owner.lock);
+ return ret;
}
static int bpf_check_tail_call(const struct bpf_prog *fp)
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 9dab49d3f394..1cad6979a0d0 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -543,8 +543,10 @@ static void bpf_map_show_fdinfo(struct seq_file *m, struct file *filp)
if (map->map_type == BPF_MAP_TYPE_PROG_ARRAY) {
array = container_of(map, struct bpf_array, map);
- type = array->aux->type;
- jited = array->aux->jited;
+ spin_lock(&array->aux->owner.lock);
+ type = array->aux->owner.type;
+ jited = array->aux->owner.jited;
+ spin_unlock(&array->aux->owner.lock);
}
seq_printf(m,
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From fcf918ffd3b35e288097036c04af7446b2c6f2f1 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Ville=20Syrj=C3=A4l=C3=A4?= <ville.syrjala(a)linux.intel.com>
Date: Thu, 14 Oct 2021 12:09:39 +0300
Subject: [PATCH] drm/i915: Convert unconditional clflush to
drm_clflush_virt_range()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
This one is apparently a "clflush for good measure", so bit more
justification (if you can call it that) than some of the others.
Convert to drm_clflush_virt_range() again so that machines without
clflush will survive the ordeal.
Cc: stable(a)vger.kernel.org
Cc: Maarten Lankhorst <maarten.lankhorst(a)linux.intel.com>
Cc: Thomas Hellström <thomas.hellstrom(a)intel.com> #v1
Fixes: 12ca695d2c1e ("drm/i915: Do not share hwsp across contexts any more, v8.")
Signed-off-by: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211014090941.12159-3-ville.…
Reviewed-by: Dave Airlie <airlied(a)redhat.com>
(cherry picked from commit af7b6d234eefa30c461cc16912bafb32b9e6141c)
Signed-off-by: Jani Nikula <jani.nikula(a)intel.com>
diff --git a/drivers/gpu/drm/i915/gt/intel_timeline.c b/drivers/gpu/drm/i915/gt/intel_timeline.c
index 1257f4f11e66..23d7328892ed 100644
--- a/drivers/gpu/drm/i915/gt/intel_timeline.c
+++ b/drivers/gpu/drm/i915/gt/intel_timeline.c
@@ -225,7 +225,7 @@ void intel_timeline_reset_seqno(const struct intel_timeline *tl)
memset(hwsp_seqno + 1, 0, TIMELINE_SEQNO_BYTES - sizeof(*hwsp_seqno));
WRITE_ONCE(*hwsp_seqno, tl->seqno);
- clflush(hwsp_seqno);
+ drm_clflush_virt_range(hwsp_seqno, TIMELINE_SEQNO_BYTES);
}
void intel_timeline_enter(struct intel_timeline *tl)
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From a4aeaa06d45e90f9b279f0b09de84bd00006e733 Mon Sep 17 00:00:00 2001
From: Yang Shi <shy828301(a)gmail.com>
Date: Thu, 28 Oct 2021 14:36:30 -0700
Subject: [PATCH] mm: khugepaged: skip huge page collapse for special files
The read-only THP for filesystems will collapse THP for files opened
readonly and mapped with VM_EXEC. The intended usecase is to avoid TLB
misses for large text segments. But it doesn't restrict the file types
so a THP could be collapsed for a non-regular file, for example, block
device, if it is opened readonly and mapped with EXEC permission. This
may cause bugs, like [1] and [2].
This is definitely not the intended usecase, so just collapse THP for
regular files in order to close the attack surface.
[shy828301(a)gmail.com: fix vm_file check [3]]
Link: https://lore.kernel.org/lkml/CACkBjsYwLYLRmX8GpsDpMthagWOjWWrNxqY6ZLNQVr6yx… [1]
Link: https://lore.kernel.org/linux-mm/000000000000c6a82505ce284e4c@google.com/ [2]
Link: https://lkml.kernel.org/r/CAHbLzkqTW9U3VvTu1Ki5v_cLRC9gHW+znBukg_ycergE0JWj… [3]
Link: https://lkml.kernel.org/r/20211027195221.3825-1-shy828301@gmail.com
Fixes: 99cb0dbd47a1 ("mm,thp: add read-only THP support for (non-shmem) FS")
Signed-off-by: Hugh Dickins <hughd(a)google.com>
Signed-off-by: Yang Shi <shy828301(a)gmail.com>
Reported-by: Hao Sun <sunhao.th(a)gmail.com>
Reported-by: syzbot+aae069be1de40fb11825(a)syzkaller.appspotmail.com
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Cc: Song Liu <songliubraving(a)fb.com>
Cc: Andrea Righi <andrea.righi(a)canonical.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 48de4e1b0783..8a8b3aa92937 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -445,22 +445,25 @@ static bool hugepage_vma_check(struct vm_area_struct *vma,
if (!transhuge_vma_enabled(vma, vm_flags))
return false;
+ if (vma->vm_file && !IS_ALIGNED((vma->vm_start >> PAGE_SHIFT) -
+ vma->vm_pgoff, HPAGE_PMD_NR))
+ return false;
+
/* Enabled via shmem mount options or sysfs settings. */
- if (shmem_file(vma->vm_file) && shmem_huge_enabled(vma)) {
- return IS_ALIGNED((vma->vm_start >> PAGE_SHIFT) - vma->vm_pgoff,
- HPAGE_PMD_NR);
- }
+ if (shmem_file(vma->vm_file))
+ return shmem_huge_enabled(vma);
/* THP settings require madvise. */
if (!(vm_flags & VM_HUGEPAGE) && !khugepaged_always())
return false;
- /* Read-only file mappings need to be aligned for THP to work. */
+ /* Only regular file is valid */
if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && vma->vm_file &&
- !inode_is_open_for_write(vma->vm_file->f_inode) &&
(vm_flags & VM_EXEC)) {
- return IS_ALIGNED((vma->vm_start >> PAGE_SHIFT) - vma->vm_pgoff,
- HPAGE_PMD_NR);
+ struct inode *inode = vma->vm_file->f_inode;
+
+ return !inode_is_open_for_write(inode) &&
+ S_ISREG(inode->i_mode);
}
if (!vma->anon_vma || vma->vm_ops)
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From a4aeaa06d45e90f9b279f0b09de84bd00006e733 Mon Sep 17 00:00:00 2001
From: Yang Shi <shy828301(a)gmail.com>
Date: Thu, 28 Oct 2021 14:36:30 -0700
Subject: [PATCH] mm: khugepaged: skip huge page collapse for special files
The read-only THP for filesystems will collapse THP for files opened
readonly and mapped with VM_EXEC. The intended usecase is to avoid TLB
misses for large text segments. But it doesn't restrict the file types
so a THP could be collapsed for a non-regular file, for example, block
device, if it is opened readonly and mapped with EXEC permission. This
may cause bugs, like [1] and [2].
This is definitely not the intended usecase, so just collapse THP for
regular files in order to close the attack surface.
[shy828301(a)gmail.com: fix vm_file check [3]]
Link: https://lore.kernel.org/lkml/CACkBjsYwLYLRmX8GpsDpMthagWOjWWrNxqY6ZLNQVr6yx… [1]
Link: https://lore.kernel.org/linux-mm/000000000000c6a82505ce284e4c@google.com/ [2]
Link: https://lkml.kernel.org/r/CAHbLzkqTW9U3VvTu1Ki5v_cLRC9gHW+znBukg_ycergE0JWj… [3]
Link: https://lkml.kernel.org/r/20211027195221.3825-1-shy828301@gmail.com
Fixes: 99cb0dbd47a1 ("mm,thp: add read-only THP support for (non-shmem) FS")
Signed-off-by: Hugh Dickins <hughd(a)google.com>
Signed-off-by: Yang Shi <shy828301(a)gmail.com>
Reported-by: Hao Sun <sunhao.th(a)gmail.com>
Reported-by: syzbot+aae069be1de40fb11825(a)syzkaller.appspotmail.com
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Cc: Song Liu <songliubraving(a)fb.com>
Cc: Andrea Righi <andrea.righi(a)canonical.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 48de4e1b0783..8a8b3aa92937 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -445,22 +445,25 @@ static bool hugepage_vma_check(struct vm_area_struct *vma,
if (!transhuge_vma_enabled(vma, vm_flags))
return false;
+ if (vma->vm_file && !IS_ALIGNED((vma->vm_start >> PAGE_SHIFT) -
+ vma->vm_pgoff, HPAGE_PMD_NR))
+ return false;
+
/* Enabled via shmem mount options or sysfs settings. */
- if (shmem_file(vma->vm_file) && shmem_huge_enabled(vma)) {
- return IS_ALIGNED((vma->vm_start >> PAGE_SHIFT) - vma->vm_pgoff,
- HPAGE_PMD_NR);
- }
+ if (shmem_file(vma->vm_file))
+ return shmem_huge_enabled(vma);
/* THP settings require madvise. */
if (!(vm_flags & VM_HUGEPAGE) && !khugepaged_always())
return false;
- /* Read-only file mappings need to be aligned for THP to work. */
+ /* Only regular file is valid */
if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && vma->vm_file &&
- !inode_is_open_for_write(vma->vm_file->f_inode) &&
(vm_flags & VM_EXEC)) {
- return IS_ALIGNED((vma->vm_start >> PAGE_SHIFT) - vma->vm_pgoff,
- HPAGE_PMD_NR);
+ struct inode *inode = vma->vm_file->f_inode;
+
+ return !inode_is_open_for_write(inode) &&
+ S_ISREG(inode->i_mode);
}
if (!vma->anon_vma || vma->vm_ops)
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 74c42e1baacf206338b1dd6b6199ac964512b5bb Mon Sep 17 00:00:00 2001
From: Rongwei Wang <rongwei.wang(a)linux.alibaba.com>
Date: Thu, 28 Oct 2021 14:36:27 -0700
Subject: [PATCH] mm, thp: bail out early in collapse_file for writeback page
Currently collapse_file does not explicitly check PG_writeback, instead,
page_has_private and try_to_release_page are used to filter writeback
pages. This does not work for xfs with blocksize equal to or larger
than pagesize, because in such case xfs has no page->private.
This makes collapse_file bail out early for writeback page. Otherwise,
xfs end_page_writeback will panic as follows.
page:fffffe00201bcc80 refcount:0 mapcount:0 mapping:ffff0003f88c86a8 index:0x0 pfn:0x84ef32
aops:xfs_address_space_operations [xfs] ino:30000b7 dentry name:"libtest.so"
flags: 0x57fffe0000008027(locked|referenced|uptodate|active|writeback)
raw: 57fffe0000008027 ffff80001b48bc28 ffff80001b48bc28 ffff0003f88c86a8
raw: 0000000000000000 0000000000000000 00000000ffffffff ffff0000c3e9a000
page dumped because: VM_BUG_ON_PAGE(((unsigned int) page_ref_count(page) + 127u <= 127u))
page->mem_cgroup:ffff0000c3e9a000
------------[ cut here ]------------
kernel BUG at include/linux/mm.h:1212!
Internal error: Oops - BUG: 0 [#1] SMP
Modules linked in:
BUG: Bad page state in process khugepaged pfn:84ef32
xfs(E)
page:fffffe00201bcc80 refcount:0 mapcount:0 mapping:0 index:0x0 pfn:0x84ef32
libcrc32c(E) rfkill(E) aes_ce_blk(E) crypto_simd(E) ...
CPU: 25 PID: 0 Comm: swapper/25 Kdump: loaded Tainted: ...
pstate: 60400005 (nZCv daif +PAN -UAO -TCO BTYPE=--)
Call trace:
end_page_writeback+0x1c0/0x214
iomap_finish_page_writeback+0x13c/0x204
iomap_finish_ioend+0xe8/0x19c
iomap_writepage_end_bio+0x38/0x50
bio_endio+0x168/0x1ec
blk_update_request+0x278/0x3f0
blk_mq_end_request+0x34/0x15c
virtblk_request_done+0x38/0x74 [virtio_blk]
blk_done_softirq+0xc4/0x110
__do_softirq+0x128/0x38c
__irq_exit_rcu+0x118/0x150
irq_exit+0x1c/0x30
__handle_domain_irq+0x8c/0xf0
gic_handle_irq+0x84/0x108
el1_irq+0xcc/0x180
arch_cpu_idle+0x18/0x40
default_idle_call+0x4c/0x1a0
cpuidle_idle_call+0x168/0x1e0
do_idle+0xb4/0x104
cpu_startup_entry+0x30/0x9c
secondary_start_kernel+0x104/0x180
Code: d4210000 b0006161 910c8021 94013f4d (d4210000)
---[ end trace 4a88c6a074082f8c ]---
Kernel panic - not syncing: Oops - BUG: Fatal exception in interrupt
Link: https://lkml.kernel.org/r/20211022023052.33114-1-rongwei.wang@linux.alibaba…
Fixes: 99cb0dbd47a1 ("mm,thp: add read-only THP support for (non-shmem) FS")
Signed-off-by: Rongwei Wang <rongwei.wang(a)linux.alibaba.com>
Signed-off-by: Xu Yu <xuyu(a)linux.alibaba.com>
Suggested-by: Yang Shi <shy828301(a)gmail.com>
Reviewed-by: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Reviewed-by: Yang Shi <shy828301(a)gmail.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Cc: Song Liu <song(a)kernel.org>
Cc: William Kucharski <william.kucharski(a)oracle.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 045cc579f724..48de4e1b0783 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -1763,6 +1763,10 @@ static void collapse_file(struct mm_struct *mm,
filemap_flush(mapping);
result = SCAN_FAIL;
goto xa_unlocked;
+ } else if (PageWriteback(page)) {
+ xas_unlock_irq(&xas);
+ result = SCAN_FAIL;
+ goto xa_unlocked;
} else if (trylock_page(page)) {
get_page(page);
xas_unlock_irq(&xas);
@@ -1798,7 +1802,8 @@ static void collapse_file(struct mm_struct *mm,
goto out_unlock;
}
- if (!is_shmem && PageDirty(page)) {
+ if (!is_shmem && (PageDirty(page) ||
+ PageWriteback(page))) {
/*
* khugepaged only works on read-only fd, so this
* page is dirty because it hasn't been flushed
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From eac96c3efdb593df1a57bb5b95dbe037bfa9a522 Mon Sep 17 00:00:00 2001
From: Yang Shi <shy828301(a)gmail.com>
Date: Thu, 28 Oct 2021 14:36:11 -0700
Subject: [PATCH] mm: filemap: check if THP has hwpoisoned subpage for PMD page
fault
When handling shmem page fault the THP with corrupted subpage could be
PMD mapped if certain conditions are satisfied. But kernel is supposed
to send SIGBUS when trying to map hwpoisoned page.
There are two paths which may do PMD map: fault around and regular
fault.
Before commit f9ce0be71d1f ("mm: Cleanup faultaround and finish_fault()
codepaths") the thing was even worse in fault around path. The THP
could be PMD mapped as long as the VMA fits regardless what subpage is
accessed and corrupted. After this commit as long as head page is not
corrupted the THP could be PMD mapped.
In the regular fault path the THP could be PMD mapped as long as the
corrupted page is not accessed and the VMA fits.
This loophole could be fixed by iterating every subpage to check if any
of them is hwpoisoned or not, but it is somewhat costly in page fault
path.
So introduce a new page flag called HasHWPoisoned on the first tail
page. It indicates the THP has hwpoisoned subpage(s). It is set if any
subpage of THP is found hwpoisoned by memory failure and after the
refcount is bumped successfully, then cleared when the THP is freed or
split.
The soft offline path doesn't need this since soft offline handler just
marks a subpage hwpoisoned when the subpage is migrated successfully.
But shmem THP didn't get split then migrated at all.
Link: https://lkml.kernel.org/r/20211020210755.23964-3-shy828301@gmail.com
Fixes: 800d8c63b2e9 ("shmem: add huge pages support")
Signed-off-by: Yang Shi <shy828301(a)gmail.com>
Reviewed-by: Naoya Horiguchi <naoya.horiguchi(a)nec.com>
Suggested-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Oscar Salvador <osalvador(a)suse.de>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index a558d67ee86f..fbfd3fad48f2 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -171,6 +171,15 @@ enum pageflags {
/* Compound pages. Stored in first tail page's flags */
PG_double_map = PG_workingset,
+#ifdef CONFIG_MEMORY_FAILURE
+ /*
+ * Compound pages. Stored in first tail page's flags.
+ * Indicates that at least one subpage is hwpoisoned in the
+ * THP.
+ */
+ PG_has_hwpoisoned = PG_mappedtodisk,
+#endif
+
/* non-lru isolated movable page */
PG_isolated = PG_reclaim,
@@ -668,6 +677,20 @@ PAGEFLAG_FALSE(DoubleMap)
TESTSCFLAG_FALSE(DoubleMap)
#endif
+#if defined(CONFIG_MEMORY_FAILURE) && defined(CONFIG_TRANSPARENT_HUGEPAGE)
+/*
+ * PageHasHWPoisoned indicates that at least one subpage is hwpoisoned in the
+ * compound page.
+ *
+ * This flag is set by hwpoison handler. Cleared by THP split or free page.
+ */
+PAGEFLAG(HasHWPoisoned, has_hwpoisoned, PF_SECOND)
+ TESTSCFLAG(HasHWPoisoned, has_hwpoisoned, PF_SECOND)
+#else
+PAGEFLAG_FALSE(HasHWPoisoned)
+ TESTSCFLAG_FALSE(HasHWPoisoned)
+#endif
+
/*
* Check if a page is currently marked HWPoisoned. Note that this check is
* best effort only and inherently racy: there is no way to synchronize with
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 92192cb086c7..c5142d237e48 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2426,6 +2426,8 @@ static void __split_huge_page(struct page *page, struct list_head *list,
/* lock lru list/PageCompound, ref frozen by page_ref_freeze */
lruvec = lock_page_lruvec(head);
+ ClearPageHasHWPoisoned(head);
+
for (i = nr - 1; i >= 1; i--) {
__split_huge_page_tail(head, i, lruvec, list);
/* Some pages can be beyond EOF: drop them from page cache */
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 73f68699e7ab..bdbbb32211a5 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1694,6 +1694,20 @@ int memory_failure(unsigned long pfn, int flags)
}
if (PageTransHuge(hpage)) {
+ /*
+ * The flag must be set after the refcount is bumped
+ * otherwise it may race with THP split.
+ * And the flag can't be set in get_hwpoison_page() since
+ * it is called by soft offline too and it is just called
+ * for !MF_COUNT_INCREASE. So here seems to be the best
+ * place.
+ *
+ * Don't need care about the above error handling paths for
+ * get_hwpoison_page() since they handle either free page
+ * or unhandlable page. The refcount is bumped iff the
+ * page is a valid handlable page.
+ */
+ SetPageHasHWPoisoned(hpage);
if (try_to_split_thp_page(p, "Memory Failure") < 0) {
action_result(pfn, MF_MSG_UNSPLIT_THP, MF_IGNORED);
res = -EBUSY;
diff --git a/mm/memory.c b/mm/memory.c
index adf9b9ef8277..c52be6d6b605 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3906,6 +3906,15 @@ vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page)
if (compound_order(page) != HPAGE_PMD_ORDER)
return ret;
+ /*
+ * Just backoff if any subpage of a THP is corrupted otherwise
+ * the corrupted page may mapped by PMD silently to escape the
+ * check. This kind of THP just can be PTE mapped. Access to
+ * the corrupted subpage should trigger SIGBUS as expected.
+ */
+ if (unlikely(PageHasHWPoisoned(page)))
+ return ret;
+
/*
* Archs like ppc64 need additional space to store information
* related to pte entry. Use the preallocated table for that.
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 3ec39552d00f..23d3339ac4e8 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1312,8 +1312,10 @@ static __always_inline bool free_pages_prepare(struct page *page,
VM_BUG_ON_PAGE(compound && compound_order(page) != order, page);
- if (compound)
+ if (compound) {
ClearPageDoubleMap(page);
+ ClearPageHasHWPoisoned(page);
+ }
for (i = 1; i < (1 << order); i++) {
if (compound)
bad += free_tail_pages_check(page, page + i);
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From eac96c3efdb593df1a57bb5b95dbe037bfa9a522 Mon Sep 17 00:00:00 2001
From: Yang Shi <shy828301(a)gmail.com>
Date: Thu, 28 Oct 2021 14:36:11 -0700
Subject: [PATCH] mm: filemap: check if THP has hwpoisoned subpage for PMD page
fault
When handling shmem page fault the THP with corrupted subpage could be
PMD mapped if certain conditions are satisfied. But kernel is supposed
to send SIGBUS when trying to map hwpoisoned page.
There are two paths which may do PMD map: fault around and regular
fault.
Before commit f9ce0be71d1f ("mm: Cleanup faultaround and finish_fault()
codepaths") the thing was even worse in fault around path. The THP
could be PMD mapped as long as the VMA fits regardless what subpage is
accessed and corrupted. After this commit as long as head page is not
corrupted the THP could be PMD mapped.
In the regular fault path the THP could be PMD mapped as long as the
corrupted page is not accessed and the VMA fits.
This loophole could be fixed by iterating every subpage to check if any
of them is hwpoisoned or not, but it is somewhat costly in page fault
path.
So introduce a new page flag called HasHWPoisoned on the first tail
page. It indicates the THP has hwpoisoned subpage(s). It is set if any
subpage of THP is found hwpoisoned by memory failure and after the
refcount is bumped successfully, then cleared when the THP is freed or
split.
The soft offline path doesn't need this since soft offline handler just
marks a subpage hwpoisoned when the subpage is migrated successfully.
But shmem THP didn't get split then migrated at all.
Link: https://lkml.kernel.org/r/20211020210755.23964-3-shy828301@gmail.com
Fixes: 800d8c63b2e9 ("shmem: add huge pages support")
Signed-off-by: Yang Shi <shy828301(a)gmail.com>
Reviewed-by: Naoya Horiguchi <naoya.horiguchi(a)nec.com>
Suggested-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Oscar Salvador <osalvador(a)suse.de>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index a558d67ee86f..fbfd3fad48f2 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -171,6 +171,15 @@ enum pageflags {
/* Compound pages. Stored in first tail page's flags */
PG_double_map = PG_workingset,
+#ifdef CONFIG_MEMORY_FAILURE
+ /*
+ * Compound pages. Stored in first tail page's flags.
+ * Indicates that at least one subpage is hwpoisoned in the
+ * THP.
+ */
+ PG_has_hwpoisoned = PG_mappedtodisk,
+#endif
+
/* non-lru isolated movable page */
PG_isolated = PG_reclaim,
@@ -668,6 +677,20 @@ PAGEFLAG_FALSE(DoubleMap)
TESTSCFLAG_FALSE(DoubleMap)
#endif
+#if defined(CONFIG_MEMORY_FAILURE) && defined(CONFIG_TRANSPARENT_HUGEPAGE)
+/*
+ * PageHasHWPoisoned indicates that at least one subpage is hwpoisoned in the
+ * compound page.
+ *
+ * This flag is set by hwpoison handler. Cleared by THP split or free page.
+ */
+PAGEFLAG(HasHWPoisoned, has_hwpoisoned, PF_SECOND)
+ TESTSCFLAG(HasHWPoisoned, has_hwpoisoned, PF_SECOND)
+#else
+PAGEFLAG_FALSE(HasHWPoisoned)
+ TESTSCFLAG_FALSE(HasHWPoisoned)
+#endif
+
/*
* Check if a page is currently marked HWPoisoned. Note that this check is
* best effort only and inherently racy: there is no way to synchronize with
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 92192cb086c7..c5142d237e48 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2426,6 +2426,8 @@ static void __split_huge_page(struct page *page, struct list_head *list,
/* lock lru list/PageCompound, ref frozen by page_ref_freeze */
lruvec = lock_page_lruvec(head);
+ ClearPageHasHWPoisoned(head);
+
for (i = nr - 1; i >= 1; i--) {
__split_huge_page_tail(head, i, lruvec, list);
/* Some pages can be beyond EOF: drop them from page cache */
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 73f68699e7ab..bdbbb32211a5 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1694,6 +1694,20 @@ int memory_failure(unsigned long pfn, int flags)
}
if (PageTransHuge(hpage)) {
+ /*
+ * The flag must be set after the refcount is bumped
+ * otherwise it may race with THP split.
+ * And the flag can't be set in get_hwpoison_page() since
+ * it is called by soft offline too and it is just called
+ * for !MF_COUNT_INCREASE. So here seems to be the best
+ * place.
+ *
+ * Don't need care about the above error handling paths for
+ * get_hwpoison_page() since they handle either free page
+ * or unhandlable page. The refcount is bumped iff the
+ * page is a valid handlable page.
+ */
+ SetPageHasHWPoisoned(hpage);
if (try_to_split_thp_page(p, "Memory Failure") < 0) {
action_result(pfn, MF_MSG_UNSPLIT_THP, MF_IGNORED);
res = -EBUSY;
diff --git a/mm/memory.c b/mm/memory.c
index adf9b9ef8277..c52be6d6b605 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3906,6 +3906,15 @@ vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page)
if (compound_order(page) != HPAGE_PMD_ORDER)
return ret;
+ /*
+ * Just backoff if any subpage of a THP is corrupted otherwise
+ * the corrupted page may mapped by PMD silently to escape the
+ * check. This kind of THP just can be PTE mapped. Access to
+ * the corrupted subpage should trigger SIGBUS as expected.
+ */
+ if (unlikely(PageHasHWPoisoned(page)))
+ return ret;
+
/*
* Archs like ppc64 need additional space to store information
* related to pte entry. Use the preallocated table for that.
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 3ec39552d00f..23d3339ac4e8 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1312,8 +1312,10 @@ static __always_inline bool free_pages_prepare(struct page *page,
VM_BUG_ON_PAGE(compound && compound_order(page) != order, page);
- if (compound)
+ if (compound) {
ClearPageDoubleMap(page);
+ ClearPageHasHWPoisoned(page);
+ }
for (i = 1; i < (1 << order); i++) {
if (compound)
bad += free_tail_pages_check(page, page + i);
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From eac96c3efdb593df1a57bb5b95dbe037bfa9a522 Mon Sep 17 00:00:00 2001
From: Yang Shi <shy828301(a)gmail.com>
Date: Thu, 28 Oct 2021 14:36:11 -0700
Subject: [PATCH] mm: filemap: check if THP has hwpoisoned subpage for PMD page
fault
When handling shmem page fault the THP with corrupted subpage could be
PMD mapped if certain conditions are satisfied. But kernel is supposed
to send SIGBUS when trying to map hwpoisoned page.
There are two paths which may do PMD map: fault around and regular
fault.
Before commit f9ce0be71d1f ("mm: Cleanup faultaround and finish_fault()
codepaths") the thing was even worse in fault around path. The THP
could be PMD mapped as long as the VMA fits regardless what subpage is
accessed and corrupted. After this commit as long as head page is not
corrupted the THP could be PMD mapped.
In the regular fault path the THP could be PMD mapped as long as the
corrupted page is not accessed and the VMA fits.
This loophole could be fixed by iterating every subpage to check if any
of them is hwpoisoned or not, but it is somewhat costly in page fault
path.
So introduce a new page flag called HasHWPoisoned on the first tail
page. It indicates the THP has hwpoisoned subpage(s). It is set if any
subpage of THP is found hwpoisoned by memory failure and after the
refcount is bumped successfully, then cleared when the THP is freed or
split.
The soft offline path doesn't need this since soft offline handler just
marks a subpage hwpoisoned when the subpage is migrated successfully.
But shmem THP didn't get split then migrated at all.
Link: https://lkml.kernel.org/r/20211020210755.23964-3-shy828301@gmail.com
Fixes: 800d8c63b2e9 ("shmem: add huge pages support")
Signed-off-by: Yang Shi <shy828301(a)gmail.com>
Reviewed-by: Naoya Horiguchi <naoya.horiguchi(a)nec.com>
Suggested-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Oscar Salvador <osalvador(a)suse.de>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index a558d67ee86f..fbfd3fad48f2 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -171,6 +171,15 @@ enum pageflags {
/* Compound pages. Stored in first tail page's flags */
PG_double_map = PG_workingset,
+#ifdef CONFIG_MEMORY_FAILURE
+ /*
+ * Compound pages. Stored in first tail page's flags.
+ * Indicates that at least one subpage is hwpoisoned in the
+ * THP.
+ */
+ PG_has_hwpoisoned = PG_mappedtodisk,
+#endif
+
/* non-lru isolated movable page */
PG_isolated = PG_reclaim,
@@ -668,6 +677,20 @@ PAGEFLAG_FALSE(DoubleMap)
TESTSCFLAG_FALSE(DoubleMap)
#endif
+#if defined(CONFIG_MEMORY_FAILURE) && defined(CONFIG_TRANSPARENT_HUGEPAGE)
+/*
+ * PageHasHWPoisoned indicates that at least one subpage is hwpoisoned in the
+ * compound page.
+ *
+ * This flag is set by hwpoison handler. Cleared by THP split or free page.
+ */
+PAGEFLAG(HasHWPoisoned, has_hwpoisoned, PF_SECOND)
+ TESTSCFLAG(HasHWPoisoned, has_hwpoisoned, PF_SECOND)
+#else
+PAGEFLAG_FALSE(HasHWPoisoned)
+ TESTSCFLAG_FALSE(HasHWPoisoned)
+#endif
+
/*
* Check if a page is currently marked HWPoisoned. Note that this check is
* best effort only and inherently racy: there is no way to synchronize with
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 92192cb086c7..c5142d237e48 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2426,6 +2426,8 @@ static void __split_huge_page(struct page *page, struct list_head *list,
/* lock lru list/PageCompound, ref frozen by page_ref_freeze */
lruvec = lock_page_lruvec(head);
+ ClearPageHasHWPoisoned(head);
+
for (i = nr - 1; i >= 1; i--) {
__split_huge_page_tail(head, i, lruvec, list);
/* Some pages can be beyond EOF: drop them from page cache */
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 73f68699e7ab..bdbbb32211a5 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1694,6 +1694,20 @@ int memory_failure(unsigned long pfn, int flags)
}
if (PageTransHuge(hpage)) {
+ /*
+ * The flag must be set after the refcount is bumped
+ * otherwise it may race with THP split.
+ * And the flag can't be set in get_hwpoison_page() since
+ * it is called by soft offline too and it is just called
+ * for !MF_COUNT_INCREASE. So here seems to be the best
+ * place.
+ *
+ * Don't need care about the above error handling paths for
+ * get_hwpoison_page() since they handle either free page
+ * or unhandlable page. The refcount is bumped iff the
+ * page is a valid handlable page.
+ */
+ SetPageHasHWPoisoned(hpage);
if (try_to_split_thp_page(p, "Memory Failure") < 0) {
action_result(pfn, MF_MSG_UNSPLIT_THP, MF_IGNORED);
res = -EBUSY;
diff --git a/mm/memory.c b/mm/memory.c
index adf9b9ef8277..c52be6d6b605 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3906,6 +3906,15 @@ vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page)
if (compound_order(page) != HPAGE_PMD_ORDER)
return ret;
+ /*
+ * Just backoff if any subpage of a THP is corrupted otherwise
+ * the corrupted page may mapped by PMD silently to escape the
+ * check. This kind of THP just can be PTE mapped. Access to
+ * the corrupted subpage should trigger SIGBUS as expected.
+ */
+ if (unlikely(PageHasHWPoisoned(page)))
+ return ret;
+
/*
* Archs like ppc64 need additional space to store information
* related to pte entry. Use the preallocated table for that.
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 3ec39552d00f..23d3339ac4e8 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1312,8 +1312,10 @@ static __always_inline bool free_pages_prepare(struct page *page,
VM_BUG_ON_PAGE(compound && compound_order(page) != order, page);
- if (compound)
+ if (compound) {
ClearPageDoubleMap(page);
+ ClearPageHasHWPoisoned(page);
+ }
for (i = 1; i < (1 << order); i++) {
if (compound)
bad += free_tail_pages_check(page, page + i);
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From eac96c3efdb593df1a57bb5b95dbe037bfa9a522 Mon Sep 17 00:00:00 2001
From: Yang Shi <shy828301(a)gmail.com>
Date: Thu, 28 Oct 2021 14:36:11 -0700
Subject: [PATCH] mm: filemap: check if THP has hwpoisoned subpage for PMD page
fault
When handling shmem page fault the THP with corrupted subpage could be
PMD mapped if certain conditions are satisfied. But kernel is supposed
to send SIGBUS when trying to map hwpoisoned page.
There are two paths which may do PMD map: fault around and regular
fault.
Before commit f9ce0be71d1f ("mm: Cleanup faultaround and finish_fault()
codepaths") the thing was even worse in fault around path. The THP
could be PMD mapped as long as the VMA fits regardless what subpage is
accessed and corrupted. After this commit as long as head page is not
corrupted the THP could be PMD mapped.
In the regular fault path the THP could be PMD mapped as long as the
corrupted page is not accessed and the VMA fits.
This loophole could be fixed by iterating every subpage to check if any
of them is hwpoisoned or not, but it is somewhat costly in page fault
path.
So introduce a new page flag called HasHWPoisoned on the first tail
page. It indicates the THP has hwpoisoned subpage(s). It is set if any
subpage of THP is found hwpoisoned by memory failure and after the
refcount is bumped successfully, then cleared when the THP is freed or
split.
The soft offline path doesn't need this since soft offline handler just
marks a subpage hwpoisoned when the subpage is migrated successfully.
But shmem THP didn't get split then migrated at all.
Link: https://lkml.kernel.org/r/20211020210755.23964-3-shy828301@gmail.com
Fixes: 800d8c63b2e9 ("shmem: add huge pages support")
Signed-off-by: Yang Shi <shy828301(a)gmail.com>
Reviewed-by: Naoya Horiguchi <naoya.horiguchi(a)nec.com>
Suggested-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Oscar Salvador <osalvador(a)suse.de>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index a558d67ee86f..fbfd3fad48f2 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -171,6 +171,15 @@ enum pageflags {
/* Compound pages. Stored in first tail page's flags */
PG_double_map = PG_workingset,
+#ifdef CONFIG_MEMORY_FAILURE
+ /*
+ * Compound pages. Stored in first tail page's flags.
+ * Indicates that at least one subpage is hwpoisoned in the
+ * THP.
+ */
+ PG_has_hwpoisoned = PG_mappedtodisk,
+#endif
+
/* non-lru isolated movable page */
PG_isolated = PG_reclaim,
@@ -668,6 +677,20 @@ PAGEFLAG_FALSE(DoubleMap)
TESTSCFLAG_FALSE(DoubleMap)
#endif
+#if defined(CONFIG_MEMORY_FAILURE) && defined(CONFIG_TRANSPARENT_HUGEPAGE)
+/*
+ * PageHasHWPoisoned indicates that at least one subpage is hwpoisoned in the
+ * compound page.
+ *
+ * This flag is set by hwpoison handler. Cleared by THP split or free page.
+ */
+PAGEFLAG(HasHWPoisoned, has_hwpoisoned, PF_SECOND)
+ TESTSCFLAG(HasHWPoisoned, has_hwpoisoned, PF_SECOND)
+#else
+PAGEFLAG_FALSE(HasHWPoisoned)
+ TESTSCFLAG_FALSE(HasHWPoisoned)
+#endif
+
/*
* Check if a page is currently marked HWPoisoned. Note that this check is
* best effort only and inherently racy: there is no way to synchronize with
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 92192cb086c7..c5142d237e48 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2426,6 +2426,8 @@ static void __split_huge_page(struct page *page, struct list_head *list,
/* lock lru list/PageCompound, ref frozen by page_ref_freeze */
lruvec = lock_page_lruvec(head);
+ ClearPageHasHWPoisoned(head);
+
for (i = nr - 1; i >= 1; i--) {
__split_huge_page_tail(head, i, lruvec, list);
/* Some pages can be beyond EOF: drop them from page cache */
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 73f68699e7ab..bdbbb32211a5 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1694,6 +1694,20 @@ int memory_failure(unsigned long pfn, int flags)
}
if (PageTransHuge(hpage)) {
+ /*
+ * The flag must be set after the refcount is bumped
+ * otherwise it may race with THP split.
+ * And the flag can't be set in get_hwpoison_page() since
+ * it is called by soft offline too and it is just called
+ * for !MF_COUNT_INCREASE. So here seems to be the best
+ * place.
+ *
+ * Don't need care about the above error handling paths for
+ * get_hwpoison_page() since they handle either free page
+ * or unhandlable page. The refcount is bumped iff the
+ * page is a valid handlable page.
+ */
+ SetPageHasHWPoisoned(hpage);
if (try_to_split_thp_page(p, "Memory Failure") < 0) {
action_result(pfn, MF_MSG_UNSPLIT_THP, MF_IGNORED);
res = -EBUSY;
diff --git a/mm/memory.c b/mm/memory.c
index adf9b9ef8277..c52be6d6b605 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3906,6 +3906,15 @@ vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page)
if (compound_order(page) != HPAGE_PMD_ORDER)
return ret;
+ /*
+ * Just backoff if any subpage of a THP is corrupted otherwise
+ * the corrupted page may mapped by PMD silently to escape the
+ * check. This kind of THP just can be PTE mapped. Access to
+ * the corrupted subpage should trigger SIGBUS as expected.
+ */
+ if (unlikely(PageHasHWPoisoned(page)))
+ return ret;
+
/*
* Archs like ppc64 need additional space to store information
* related to pte entry. Use the preallocated table for that.
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 3ec39552d00f..23d3339ac4e8 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1312,8 +1312,10 @@ static __always_inline bool free_pages_prepare(struct page *page,
VM_BUG_ON_PAGE(compound && compound_order(page) != order, page);
- if (compound)
+ if (compound) {
ClearPageDoubleMap(page);
+ ClearPageHasHWPoisoned(page);
+ }
for (i = 1; i < (1 << order); i++) {
if (compound)
bad += free_tail_pages_check(page, page + i);
This is a note to let you know that I've just added the patch titled
usb: gadget: Mark USB_FSL_QE broken on 64-bit
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-testing branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will be merged to the usb-next branch sometime soon,
after it passes testing, and the merge window is open.
If you have any questions about this process, please let me know.
>From a0548b26901f082684ad1fb3ba397d2de3a1406a Mon Sep 17 00:00:00 2001
From: Geert Uytterhoeven <geert(a)linux-m68k.org>
Date: Wed, 27 Oct 2021 10:08:49 +0200
Subject: usb: gadget: Mark USB_FSL_QE broken on 64-bit
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
On 64-bit:
drivers/usb/gadget/udc/fsl_qe_udc.c: In function ‘qe_ep0_rx’:
drivers/usb/gadget/udc/fsl_qe_udc.c:842:13: error: cast from pointer to integer of different size [-Werror=pointer-to-int-cast]
842 | vaddr = (u32)phys_to_virt(in_be32(&bd->buf));
| ^
In file included from drivers/usb/gadget/udc/fsl_qe_udc.c:41:
drivers/usb/gadget/udc/fsl_qe_udc.c:843:28: error: cast to pointer from integer of different size [-Werror=int-to-pointer-cast]
843 | frame_set_data(pframe, (u8 *)vaddr);
| ^
The driver assumes physical and virtual addresses are 32-bit, hence it
cannot work on 64-bit platforms.
Acked-by: Li Yang <leoyang.li(a)nxp.com>
Signed-off-by: Geert Uytterhoeven <geert(a)linux-m68k.org>
Link: https://lore.kernel.org/r/20211027080849.3276289-1-geert@linux-m68k.org
Cc: stable <stable(a)vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/gadget/udc/Kconfig | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/usb/gadget/udc/Kconfig b/drivers/usb/gadget/udc/Kconfig
index 8c614bb86c66..69394dc1cdfb 100644
--- a/drivers/usb/gadget/udc/Kconfig
+++ b/drivers/usb/gadget/udc/Kconfig
@@ -330,6 +330,7 @@ config USB_AMD5536UDC
config USB_FSL_QE
tristate "Freescale QE/CPM USB Device Controller"
depends on FSL_SOC && (QUICC_ENGINE || CPM)
+ depends on !64BIT || BROKEN
help
Some of Freescale PowerPC processors have a Full Speed
QE/CPM2 USB controller, which support device mode with 4
--
2.33.1
This is a note to let you know that I've just added the patch titled
coresight: trbe: Defer the probe on offline CPUs
to my char-misc git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
in the char-misc-testing branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will be merged to the char-misc-next branch sometime soon,
after it passes testing, and the merge window is open.
If you have any questions about this process, please let me know.
>From a08025b3fe56185290a1ea476581f03ca733f967 Mon Sep 17 00:00:00 2001
From: Suzuki K Poulose <suzuki.poulose(a)arm.com>
Date: Thu, 14 Oct 2021 15:22:38 +0100
Subject: coresight: trbe: Defer the probe on offline CPUs
If a CPU is offline during the driver init, we could end up causing
a kernel crash trying to register the coresight device for the TRBE
instance. The trbe_cpudata for the TRBE instance is initialized only
when it is probed. Otherwise, we could end up dereferencing a NULL
cpudata->drvdata.
e.g:
[ 0.149999] coresight ete0: CPU0: ete v1.1 initialized
[ 0.149999] coresight-etm4x ete_1: ETM arch init failed
[ 0.149999] coresight-etm4x: probe of ete_1 failed with error -22
[ 0.150085] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000050
[ 0.150085] Mem abort info:
[ 0.150085] ESR = 0x96000005
[ 0.150085] EC = 0x25: DABT (current EL), IL = 32 bits
[ 0.150085] SET = 0, FnV = 0
[ 0.150085] EA = 0, S1PTW = 0
[ 0.150085] Data abort info:
[ 0.150085] ISV = 0, ISS = 0x00000005
[ 0.150085] CM = 0, WnR = 0
[ 0.150085] [0000000000000050] user address but active_mm is swapper
[ 0.150085] Internal error: Oops: 96000005 [#1] PREEMPT SMP
[ 0.150085] Modules linked in:
[ 0.150085] Hardware name: FVP Base RevC (DT)
[ 0.150085] pstate: 00800009 (nzcv daif -PAN +UAO -TCO BTYPE=--)
[ 0.150155] pc : arm_trbe_register_coresight_cpu+0x74/0x144
[ 0.150155] lr : arm_trbe_register_coresight_cpu+0x48/0x144
...
[ 0.150237] Call trace:
[ 0.150237] arm_trbe_register_coresight_cpu+0x74/0x144
[ 0.150237] arm_trbe_device_probe+0x1c0/0x2d8
[ 0.150259] platform_drv_probe+0x94/0xbc
[ 0.150259] really_probe+0x1bc/0x4a8
[ 0.150266] driver_probe_device+0x7c/0xb8
[ 0.150266] device_driver_attach+0x6c/0xac
[ 0.150266] __driver_attach+0xc4/0x148
[ 0.150266] bus_for_each_dev+0x7c/0xc8
[ 0.150266] driver_attach+0x24/0x30
[ 0.150266] bus_add_driver+0x100/0x1e0
[ 0.150266] driver_register+0x78/0x110
[ 0.150266] __platform_driver_register+0x44/0x50
[ 0.150266] arm_trbe_init+0x28/0x84
[ 0.150266] do_one_initcall+0x94/0x2bc
[ 0.150266] do_initcall_level+0xa4/0x158
[ 0.150266] do_initcalls+0x54/0x94
[ 0.150319] do_basic_setup+0x24/0x30
[ 0.150319] kernel_init_freeable+0xe8/0x14c
[ 0.150319] kernel_init+0x14/0x18c
[ 0.150319] ret_from_fork+0x10/0x30
[ 0.150319] Code: f94012c8 b0004ce2 9134a442 52819801 (f9402917)
[ 0.150319] ---[ end trace d23e0cfe5098535e ]---
[ 0.150346] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
Fix this by skipping the step, if we are unable to probe the CPU.
Fixes: 3fbf7f011f24 ("coresight: sink: Add TRBE driver")
Reported-by: Bransilav Rankov <branislav.rankov(a)arm.com>
Cc: Anshuman Khandual <anshuman.khandual(a)arm.com>
Cc: Mathieu Poirier <mathieu.poirier(a)linaro.org>
Cc: Mike Leach <mike.leach(a)linaro.org>
Cc: Leo Yan <leo.yan(a)linaro.org>
Cc: stable <stable(a)vger.kernel.org>
Tested-by: Branislav Rankov <branislav.rankov(a)arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose(a)arm.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual(a)arm.com>
Link: https://lore.kernel.org/r/20211014142238.2221248-1-suzuki.poulose@arm.com
Signed-off-by: Mathieu Poirier <mathieu.poirier(a)linaro.org>
---
drivers/hwtracing/coresight/coresight-trbe.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c
index 2825ccb0cf39..5d77baba8b0f 100644
--- a/drivers/hwtracing/coresight/coresight-trbe.c
+++ b/drivers/hwtracing/coresight/coresight-trbe.c
@@ -893,6 +893,10 @@ static void arm_trbe_register_coresight_cpu(struct trbe_drvdata *drvdata, int cp
if (WARN_ON(trbe_csdev))
return;
+ /* If the TRBE was not probed on the CPU, we shouldn't be here */
+ if (WARN_ON(!cpudata->drvdata))
+ return;
+
dev = &cpudata->drvdata->pdev->dev;
desc.name = devm_kasprintf(dev, GFP_KERNEL, "trbe%d", cpu);
if (!desc.name)
@@ -974,7 +978,9 @@ static int arm_trbe_probe_coresight(struct trbe_drvdata *drvdata)
return -ENOMEM;
for_each_cpu(cpu, &drvdata->supported_cpus) {
- smp_call_function_single(cpu, arm_trbe_probe_cpu, drvdata, 1);
+ /* If we fail to probe the CPU, let us defer it to hotplug callbacks */
+ if (smp_call_function_single(cpu, arm_trbe_probe_cpu, drvdata, 1))
+ continue;
if (cpumask_test_cpu(cpu, &drvdata->supported_cpus))
arm_trbe_register_coresight_cpu(drvdata, cpu);
if (cpumask_test_cpu(cpu, &drvdata->supported_cpus))
--
2.33.1
This is a note to let you know that I've just added the patch titled
coresight: trbe: Fix incorrect access of the sink specific data
to my char-misc git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
in the char-misc-testing branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will be merged to the char-misc-next branch sometime soon,
after it passes testing, and the merge window is open.
If you have any questions about this process, please let me know.
>From bb5293e334af51b19b62d8bef1852ea13e935e9b Mon Sep 17 00:00:00 2001
From: Suzuki K Poulose <suzuki.poulose(a)arm.com>
Date: Tue, 21 Sep 2021 14:41:05 +0100
Subject: coresight: trbe: Fix incorrect access of the sink specific data
The TRBE driver wrongly treats the aux private data as the TRBE driver
specific buffer for a given perf handle, while it is the ETM PMU's
event specific data. Fix this by correcting the instance to use
appropriate helper.
Cc: stable <stable(a)vger.kernel.org>
Fixes: 3fbf7f011f24 ("coresight: sink: Add TRBE driver")
Signed-off-by: Suzuki K Poulose <suzuki.poulose(a)arm.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual(a)arm.com>
Link: https://lore.kernel.org/r/20210921134121.2423546-2-suzuki.poulose@arm.com
[Fixed 13 character SHA down to 12]
Signed-off-by: Mathieu Poirier <mathieu.poirier(a)linaro.org>
---
drivers/hwtracing/coresight/coresight-trbe.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c
index a53ee98f312f..2825ccb0cf39 100644
--- a/drivers/hwtracing/coresight/coresight-trbe.c
+++ b/drivers/hwtracing/coresight/coresight-trbe.c
@@ -382,7 +382,7 @@ static unsigned long __trbe_normal_offset(struct perf_output_handle *handle)
static unsigned long trbe_normal_offset(struct perf_output_handle *handle)
{
- struct trbe_buf *buf = perf_get_aux(handle);
+ struct trbe_buf *buf = etm_perf_sink_config(handle);
u64 limit = __trbe_normal_offset(handle);
u64 head = PERF_IDX2OFF(handle->head, buf);
--
2.33.1
This is a note to let you know that I've just added the patch titled
coresight: cti: Correct the parameter for pm_runtime_put
to my char-misc git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
in the char-misc-testing branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will be merged to the char-misc-next branch sometime soon,
after it passes testing, and the merge window is open.
If you have any questions about this process, please let me know.
>From 692c9a499b286ea478f41b23a91fe3873b9e1326 Mon Sep 17 00:00:00 2001
From: Tao Zhang <quic_taozha(a)quicinc.com>
Date: Thu, 19 Aug 2021 17:29:37 +0800
Subject: coresight: cti: Correct the parameter for pm_runtime_put
The input parameter of the function pm_runtime_put should be the
same in the function cti_enable_hw and cti_disable_hw. The correct
parameter to use here should be dev->parent.
Signed-off-by: Tao Zhang <quic_taozha(a)quicinc.com>
Reviewed-by: Leo Yan <leo.yan(a)linaro.org>
Fixes: 835d722ba10a ("coresight: cti: Initial CoreSight CTI Driver")
Cc: stable <stable(a)vger.kernel.org>
Link: https://lore.kernel.org/r/1629365377-5937-1-git-send-email-quic_taozha@quic…
Signed-off-by: Mathieu Poirier <mathieu.poirier(a)linaro.org>
---
drivers/hwtracing/coresight/coresight-cti-core.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/hwtracing/coresight/coresight-cti-core.c b/drivers/hwtracing/coresight/coresight-cti-core.c
index e2a3620cbf48..8988b2ed2ea6 100644
--- a/drivers/hwtracing/coresight/coresight-cti-core.c
+++ b/drivers/hwtracing/coresight/coresight-cti-core.c
@@ -175,7 +175,7 @@ static int cti_disable_hw(struct cti_drvdata *drvdata)
coresight_disclaim_device_unlocked(csdev);
CS_LOCK(drvdata->base);
spin_unlock(&drvdata->spinlock);
- pm_runtime_put(dev);
+ pm_runtime_put(dev->parent);
return 0;
/* not disabled this call */
--
2.33.1
The HID_QUIRK_INVERT caused BTN_TOOL_RUBBER events were reported at the
same time as events for BTN_TOOL_PEN/PENCIL/etc, if HID_QUIRK_INVERT
was set by a stylus' sideswitch. The reality is that a pen can only be
a stylus (writing/drawing) or an eraser, but not both at the same time.
This patch makes that logic correct.
CC: stable(a)vger.kernel.org # 2.4+
Signed-off-by: Ping Cheng <ping.cheng(a)wacom.com>
Reviewed-by: Jason Gerecke <killertofu(a)gmail.com>
Tested-by: Tatsunosuke Tobita <junkpainting(a)gmail.com>
---
drivers/hid/hid-input.c | 24 ++++++++++++++++++++----
1 file changed, 20 insertions(+), 4 deletions(-)
diff --git a/drivers/hid/hid-input.c b/drivers/hid/hid-input.c
index 4b5ebeacd283..85741a2d828d 100644
--- a/drivers/hid/hid-input.c
+++ b/drivers/hid/hid-input.c
@@ -1344,12 +1344,28 @@ void hidinput_hid_event(struct hid_device *hid, struct hid_field *field, struct
}
if (usage->hid == HID_DG_INRANGE) {
+ /* when HID_QUIRK_INVERT is set by a stylus sideswitch, HID_DG_INRANGE could be
+ * for stylus or eraser. Make sure events are only posted to the current valid tool:
+ * BTN_TOOL_RUBBER vs BTN_TOOL_PEN/BTN_TOOL_PENCIL/BTN_TOOL_BRUSH/etc since a pen
+ * can not be used as a stylus (to draw/write) and an erasaer at the same time
+ */
+ static unsigned int last_code = 0;
+ unsigned int code = (*quirks & HID_QUIRK_INVERT) ? BTN_TOOL_RUBBER : usage->code;
if (value) {
- input_event(input, usage->type, (*quirks & HID_QUIRK_INVERT) ? BTN_TOOL_RUBBER : usage->code, 1);
- return;
+ if (code != last_code) {
+ /* send the last tool out before allow the new one in */
+ if (last_code)
+ input_event(input, usage->type, last_code, 0);
+ input_event(input, usage->type, code, 1);
+ }
+ last_code = code;
+ } else {
+ /* only send the last valid tool out */
+ if (last_code)
+ input_event(input, usage->type, last_code, 0);
+ /* reset tool for next cycle */
+ last_code = 0;
}
- input_event(input, usage->type, usage->code, 0);
- input_event(input, usage->type, BTN_TOOL_RUBBER, 0);
return;
}
--
2.25.1
Prior to commit 6c836d965bad ("drm/rockchip: Use the helpers for PSR"),
"PSR disable" used non-blocking analogix_dp_send_psr_spd(). The refactor
accidentally (?) set blocking=true.
This can cause upwards of 60-100ms of unneeded latency when exiting
self-refresh, which can cause very noticeable lag when, say, moving a
cursor.
Presumbaly it's OK to let the display finish exiting refresh in parallel
with clocking out the next video frames, so we shouldn't hold up the
atomic_enable() step. This also brings behavior in line with the
downstream ("mainline-derived") variant of the driver currently deployed
to Chrome OS Rockchip systems.
Tested on a Samsung Chromebook Plus (i.e., Rockchip RK3399 Gru Kevin).
Fixes: 6c836d965bad ("drm/rockchip: Use the helpers for PSR")
Cc: <stable(a)vger.kernel.org>
Cc: Zain Wang <wzz(a)rock-chips.com>
Cc: Tomasz Figa <tfiga(a)chromium.org>
Cc: Heiko Stuebner <heiko(a)sntech.de>
Cc: Sean Paul <seanpaul(a)chromium.org>
Signed-off-by: Brian Norris <briannorris(a)chromium.org>
---
CC list is partially constructed from the commit message of the Fixed
commit
drivers/gpu/drm/bridge/analogix/analogix_dp_core.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/bridge/analogix/analogix_dp_core.c b/drivers/gpu/drm/bridge/analogix/analogix_dp_core.c
index b7d2e4449cfa..fbe6eb9df310 100644
--- a/drivers/gpu/drm/bridge/analogix/analogix_dp_core.c
+++ b/drivers/gpu/drm/bridge/analogix/analogix_dp_core.c
@@ -1055,7 +1055,7 @@ static int analogix_dp_disable_psr(struct analogix_dp_device *dp)
psr_vsc.db[0] = 0;
psr_vsc.db[1] = 0;
- return analogix_dp_send_psr_spd(dp, &psr_vsc, true);
+ return analogix_dp_send_psr_spd(dp, &psr_vsc, false);
}
/*
--
2.33.0.1079.g6e70778dc9-goog
When running as PVH or HVM guest with actual memory < max memory the
hypervisor is using "populate on demand" in order to allow the guest
to balloon down from its maximum memory size. For this to work
correctly the guest must not touch more memory pages than its target
memory size as otherwise the PoD cache will be exhausted and the guest
is crashed as a result of that.
In extreme cases ballooning down might not be finished today before
the init process is started, which can consume lots of memory.
In order to avoid random boot crashes in such cases, add a late init
call to wait for ballooning down having finished for PVH/HVM guests.
Cc: <stable(a)vger.kernel.org>
Reported-by: Marek Marczykowski-Górecki <marmarek(a)invisiblethingslab.com>
Signed-off-by: Juergen Gross <jgross(a)suse.com>
---
drivers/xen/balloon.c | 20 ++++++++++++++++++++
1 file changed, 20 insertions(+)
diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
index 3a50f097ed3e..d19b851c3d3b 100644
--- a/drivers/xen/balloon.c
+++ b/drivers/xen/balloon.c
@@ -765,3 +765,23 @@ static int __init balloon_init(void)
return 0;
}
subsys_initcall(balloon_init);
+
+static int __init balloon_wait_finish(void)
+{
+ if (!xen_domain())
+ return -ENODEV;
+
+ /* PV guests don't need to wait. */
+ if (xen_pv_domain() || !current_credit())
+ return 0;
+
+ pr_info("Waiting for initial ballooning down having finished.\n");
+
+ while (current_credit())
+ schedule_timeout_interruptible(HZ / 10);
+
+ pr_info("Initial ballooning down finished.\n");
+
+ return 0;
+}
+late_initcall_sync(balloon_wait_finish);
--
2.26.2
When running as PVH or HVM guest with actual memory < max memory the
hypervisor is using "populate on demand" in order to allow the guest
to balloon down from its maximum memory size. For this to work
correctly the guest must not touch more memory pages than its target
memory size as otherwise the PoD cache will be exhausted and the guest
is crashed as a result of that.
In extreme cases ballooning down might not be finished today before
the init process is started, which can consume lots of memory.
In order to avoid random boot crashes in such cases, add a late init
call to wait for ballooning down having finished for PVH/HVM guests.
Warn on console if ballooning stalls for more than 1 minute, panic()
after stalling for more than 3 minutes.
Cc: <stable(a)vger.kernel.org>
Reported-by: Marek Marczykowski-Górecki <marmarek(a)invisiblethingslab.com>
Signed-off-by: Juergen Gross <jgross(a)suse.com>
---
V2:
- add warning and panic() when stalling (Marek Marczykowski-Górecki)
- don't wait if credit > 0
---
drivers/xen/balloon.c | 35 +++++++++++++++++++++++++++++++++++
1 file changed, 35 insertions(+)
diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
index 3a50f097ed3e..c0c1c754e515 100644
--- a/drivers/xen/balloon.c
+++ b/drivers/xen/balloon.c
@@ -765,3 +765,38 @@ static int __init balloon_init(void)
return 0;
}
subsys_initcall(balloon_init);
+
+static int __init balloon_wait_finish(void)
+{
+ long credit, last_credit = 0;
+ unsigned long last_changed;
+
+ if (!xen_domain())
+ return -ENODEV;
+
+ /* PV guests don't need to wait. */
+ if (xen_pv_domain() || !current_credit())
+ return 0;
+
+ pr_info("Waiting for initial ballooning down having finished.\n");
+
+ while ((credit = current_credit()) < 0) {
+ if (credit != last_credit) {
+ last_changed = jiffies;
+ last_credit = credit;
+ }
+ if (jiffies - last_changed >= HZ * 60) {
+ pr_warn_once("Initial ballooning stalling for 60s, %ld pages need to be freed.\n",
+ -credit);
+ }
+ if (jiffies - last_changed >= 3 * HZ * 60)
+ panic("Initial ballooning failed!\n");
+
+ schedule_timeout_interruptible(HZ / 10);
+ }
+
+ pr_info("Initial ballooning down finished.\n");
+
+ return 0;
+}
+late_initcall_sync(balloon_wait_finish);
--
2.26.2
From: Yang Shi <shy828301(a)gmail.com>
Subject: mm: khugepaged: skip huge page collapse for special files
The read-only THP for filesystems will collapse THP for files opened
readonly and mapped with VM_EXEC. The intended usecase is to avoid TLB
misses for large text segments. But it doesn't restrict the file types so
a THP could be collapsed for a non-regular file, for example, block
device, if it is opened readonly and mapped with EXEC permission. This
may cause bugs, like [1] and [2].
This is definitely not the intended usecase, so just collapse THP for
regular files in order to close the attack surface.
[1] https://lore.kernel.org/lkml/CACkBjsYwLYLRmX8GpsDpMthagWOjWWrNxqY6ZLNQVr6yx…
[2] https://lore.kernel.org/linux-mm/000000000000c6a82505ce284e4c@google.com/
[shy828301(a)gmail.com: fix vm_file check]
Link: https://lkml.kernel.org/r/CAHbLzkqTW9U3VvTu1Ki5v_cLRC9gHW+znBukg_ycergE0JWj…
Link: https://lkml.kernel.org/r/20211027195221.3825-1-shy828301@gmail.com
Fixes: 99cb0dbd47a1 ("mm,thp: add read-only THP support for (non-shmem) FS")
Signed-off-by: Hugh Dickins <hughd(a)google.com>
Signed-off-by: Yang Shi <shy828301(a)gmail.com>
Reported-by: Hao Sun <sunhao.th(a)gmail.com>
Reported-by: syzbot+aae069be1de40fb11825(a)syzkaller.appspotmail.com
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Cc: Song Liu <songliubraving(a)fb.com>
Cc: Andrea Righi <andrea.righi(a)canonical.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/khugepaged.c | 19 +++++++++++--------
1 file changed, 11 insertions(+), 8 deletions(-)
--- a/mm/khugepaged.c~mm-khugepaged-skip-huge-page-collapse-for-special-files
+++ a/mm/khugepaged.c
@@ -445,22 +445,25 @@ static bool hugepage_vma_check(struct vm
if (!transhuge_vma_enabled(vma, vm_flags))
return false;
+ if (vma->vm_file && !IS_ALIGNED((vma->vm_start >> PAGE_SHIFT) -
+ vma->vm_pgoff, HPAGE_PMD_NR))
+ return false;
+
/* Enabled via shmem mount options or sysfs settings. */
- if (shmem_file(vma->vm_file) && shmem_huge_enabled(vma)) {
- return IS_ALIGNED((vma->vm_start >> PAGE_SHIFT) - vma->vm_pgoff,
- HPAGE_PMD_NR);
- }
+ if (shmem_file(vma->vm_file))
+ return shmem_huge_enabled(vma);
/* THP settings require madvise. */
if (!(vm_flags & VM_HUGEPAGE) && !khugepaged_always())
return false;
- /* Read-only file mappings need to be aligned for THP to work. */
+ /* Only regular file is valid */
if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && vma->vm_file &&
- !inode_is_open_for_write(vma->vm_file->f_inode) &&
(vm_flags & VM_EXEC)) {
- return IS_ALIGNED((vma->vm_start >> PAGE_SHIFT) - vma->vm_pgoff,
- HPAGE_PMD_NR);
+ struct inode *inode = vma->vm_file->f_inode;
+
+ return !inode_is_open_for_write(inode) &&
+ S_ISREG(inode->i_mode);
}
if (!vma->anon_vma || vma->vm_ops)
_
From: Yang Shi <shy828301(a)gmail.com>
Subject: mm: filemap: check if THP has hwpoisoned subpage for PMD page fault
When handling shmem page fault the THP with corrupted subpage could be PMD
mapped if certain conditions are satisfied. But kernel is supposed to
send SIGBUS when trying to map hwpoisoned page.
There are two paths which may do PMD map: fault around and regular fault.
Before commit f9ce0be71d1f ("mm: Cleanup faultaround and finish_fault()
codepaths") the thing was even worse in fault around path. The THP could
be PMD mapped as long as the VMA fits regardless what subpage is accessed
and corrupted. After this commit as long as head page is not corrupted
the THP could be PMD mapped.
In the regular fault path the THP could be PMD mapped as long as the
corrupted page is not accessed and the VMA fits.
This loophole could be fixed by iterating every subpage to check if any of
them is hwpoisoned or not, but it is somewhat costly in page fault path.
So introduce a new page flag called HasHWPoisoned on the first tail page.
It indicates the THP has hwpoisoned subpage(s). It is set if any subpage
of THP is found hwpoisoned by memory failure and after the refcount is
bumped successfully, then cleared when the THP is freed or split.
The soft offline path doesn't need this since soft offline handler just
marks a subpage hwpoisoned when the subpage is migrated successfully. But
shmem THP didn't get split then migrated at all.
Link: https://lkml.kernel.org/r/20211020210755.23964-3-shy828301@gmail.com
Fixes: 800d8c63b2e9 ("shmem: add huge pages support")
Signed-off-by: Yang Shi <shy828301(a)gmail.com>
Reviewed-by: Naoya Horiguchi <naoya.horiguchi(a)nec.com>
Suggested-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Oscar Salvador <osalvador(a)suse.de>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/page-flags.h | 23 +++++++++++++++++++++++
mm/huge_memory.c | 2 ++
mm/memory-failure.c | 14 ++++++++++++++
mm/memory.c | 9 +++++++++
mm/page_alloc.c | 4 +++-
5 files changed, 51 insertions(+), 1 deletion(-)
--- a/include/linux/page-flags.h~mm-filemap-check-if-thp-has-hwpoisoned-subpage-for-pmd-page-fault
+++ a/include/linux/page-flags.h
@@ -171,6 +171,15 @@ enum pageflags {
/* Compound pages. Stored in first tail page's flags */
PG_double_map = PG_workingset,
+#ifdef CONFIG_MEMORY_FAILURE
+ /*
+ * Compound pages. Stored in first tail page's flags.
+ * Indicates that at least one subpage is hwpoisoned in the
+ * THP.
+ */
+ PG_has_hwpoisoned = PG_mappedtodisk,
+#endif
+
/* non-lru isolated movable page */
PG_isolated = PG_reclaim,
@@ -668,6 +677,20 @@ PAGEFLAG_FALSE(DoubleMap)
TESTSCFLAG_FALSE(DoubleMap)
#endif
+#if defined(CONFIG_MEMORY_FAILURE) && defined(CONFIG_TRANSPARENT_HUGEPAGE)
+/*
+ * PageHasHWPoisoned indicates that at least one subpage is hwpoisoned in the
+ * compound page.
+ *
+ * This flag is set by hwpoison handler. Cleared by THP split or free page.
+ */
+PAGEFLAG(HasHWPoisoned, has_hwpoisoned, PF_SECOND)
+ TESTSCFLAG(HasHWPoisoned, has_hwpoisoned, PF_SECOND)
+#else
+PAGEFLAG_FALSE(HasHWPoisoned)
+ TESTSCFLAG_FALSE(HasHWPoisoned)
+#endif
+
/*
* Check if a page is currently marked HWPoisoned. Note that this check is
* best effort only and inherently racy: there is no way to synchronize with
--- a/mm/huge_memory.c~mm-filemap-check-if-thp-has-hwpoisoned-subpage-for-pmd-page-fault
+++ a/mm/huge_memory.c
@@ -2426,6 +2426,8 @@ static void __split_huge_page(struct pag
/* lock lru list/PageCompound, ref frozen by page_ref_freeze */
lruvec = lock_page_lruvec(head);
+ ClearPageHasHWPoisoned(head);
+
for (i = nr - 1; i >= 1; i--) {
__split_huge_page_tail(head, i, lruvec, list);
/* Some pages can be beyond EOF: drop them from page cache */
--- a/mm/memory.c~mm-filemap-check-if-thp-has-hwpoisoned-subpage-for-pmd-page-fault
+++ a/mm/memory.c
@@ -3907,6 +3907,15 @@ vm_fault_t do_set_pmd(struct vm_fault *v
return ret;
/*
+ * Just backoff if any subpage of a THP is corrupted otherwise
+ * the corrupted page may mapped by PMD silently to escape the
+ * check. This kind of THP just can be PTE mapped. Access to
+ * the corrupted subpage should trigger SIGBUS as expected.
+ */
+ if (unlikely(PageHasHWPoisoned(page)))
+ return ret;
+
+ /*
* Archs like ppc64 need additional space to store information
* related to pte entry. Use the preallocated table for that.
*/
--- a/mm/memory-failure.c~mm-filemap-check-if-thp-has-hwpoisoned-subpage-for-pmd-page-fault
+++ a/mm/memory-failure.c
@@ -1694,6 +1694,20 @@ try_again:
}
if (PageTransHuge(hpage)) {
+ /*
+ * The flag must be set after the refcount is bumped
+ * otherwise it may race with THP split.
+ * And the flag can't be set in get_hwpoison_page() since
+ * it is called by soft offline too and it is just called
+ * for !MF_COUNT_INCREASE. So here seems to be the best
+ * place.
+ *
+ * Don't need care about the above error handling paths for
+ * get_hwpoison_page() since they handle either free page
+ * or unhandlable page. The refcount is bumped iff the
+ * page is a valid handlable page.
+ */
+ SetPageHasHWPoisoned(hpage);
if (try_to_split_thp_page(p, "Memory Failure") < 0) {
action_result(pfn, MF_MSG_UNSPLIT_THP, MF_IGNORED);
res = -EBUSY;
--- a/mm/page_alloc.c~mm-filemap-check-if-thp-has-hwpoisoned-subpage-for-pmd-page-fault
+++ a/mm/page_alloc.c
@@ -1312,8 +1312,10 @@ static __always_inline bool free_pages_p
VM_BUG_ON_PAGE(compound && compound_order(page) != order, page);
- if (compound)
+ if (compound) {
ClearPageDoubleMap(page);
+ ClearPageHasHWPoisoned(page);
+ }
for (i = 1; i < (1 << order); i++) {
if (compound)
bad += free_tail_pages_check(page, page + i);
_
From: Yang Shi <shy828301(a)gmail.com>
Subject: mm: hwpoison: remove the unnecessary THP check
When handling THP hwpoison checked if the THP is in allocation or free
stage since hwpoison may mistreat it as hugetlb page. After commit
415c64c1453a ("mm/memory-failure: split thp earlier in memory error
handling") the problem has been fixed, so this check is no longer needed.
Remove it. The side effect of the removal is hwpoison may report unsplit
THP instead of unknown error for shmem THP. It seems not like a big deal.
The following patch "mm: filemap: check if THP has hwpoisoned subpage for
PMD page fault" depends on this, which fixes shmem THP with hwpoisoned
subpage(s) are mapped PMD wrongly. So this patch needs to be backported
to -stable as well.
Link: https://lkml.kernel.org/r/20211020210755.23964-2-shy828301@gmail.com
Signed-off-by: Yang Shi <shy828301(a)gmail.com>
Suggested-by: Naoya Horiguchi <naoya.horiguchi(a)nec.com>
Acked-by: Naoya Horiguchi <naoya.horiguchi(a)nec.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Oscar Salvador <osalvador(a)suse.de>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memory-failure.c | 14 --------------
1 file changed, 14 deletions(-)
--- a/mm/memory-failure.c~mm-hwpoison-remove-the-unnecessary-thp-check
+++ a/mm/memory-failure.c
@@ -1147,20 +1147,6 @@ static int __get_hwpoison_page(struct pa
if (!HWPoisonHandlable(head))
return -EBUSY;
- if (PageTransHuge(head)) {
- /*
- * Non anonymous thp exists only in allocation/free time. We
- * can't handle such a case correctly, so let's give it up.
- * This should be better than triggering BUG_ON when kernel
- * tries to touch the "partially handled" page.
- */
- if (!PageAnon(head)) {
- pr_err("Memory failure: %#lx: non anonymous thp\n",
- page_to_pfn(page));
- return 0;
- }
- }
-
if (get_page_unless_zero(head)) {
if (head == compound_head(page))
return 1;
_
Add the missing endpoint sanity checks to probe() to avoid division by
zero in mwifiex_write_data_sync() in case a malicious device has broken
descriptors (or when doing descriptor fuzz testing).
Only add checks for the firmware-download boot stage, which require both
command endpoints, for now. The driver looks like it will handle a
missing endpoint during normal operation without oopsing, albeit not
very gracefully as it will try to submit URBs to the default pipe and
fail.
Note that USB core will reject URBs submitted for endpoints with zero
wMaxPacketSize but that drivers doing packet-size calculations still
need to handle this (cf. commit 2548288b4fb0 ("USB: Fix: Don't skip
endpoint descriptors with maxpacket=0")).
Fixes: 4daffe354366 ("mwifiex: add support for Marvell USB8797 chipset")
Cc: stable(a)vger.kernel.org # 3.5
Cc: Amitkumar Karwar <akarwar(a)marvell.com>
Signed-off-by: Johan Hovold <johan(a)kernel.org>
---
drivers/net/wireless/marvell/mwifiex/usb.c | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/drivers/net/wireless/marvell/mwifiex/usb.c b/drivers/net/wireless/marvell/mwifiex/usb.c
index 426e39d4ccf0..9736aa0ab7fd 100644
--- a/drivers/net/wireless/marvell/mwifiex/usb.c
+++ b/drivers/net/wireless/marvell/mwifiex/usb.c
@@ -505,6 +505,22 @@ static int mwifiex_usb_probe(struct usb_interface *intf,
}
}
+ switch (card->usb_boot_state) {
+ case USB8XXX_FW_DNLD:
+ /* Reject broken descriptors. */
+ if (!card->rx_cmd_ep || !card->tx_cmd_ep)
+ return -ENODEV;
+ if (card->bulk_out_maxpktsize == 0)
+ return -ENODEV;
+ break;
+ case USB8XXX_FW_READY:
+ /* Assume the driver can handle missing endpoints for now. */
+ break;
+ default:
+ WARN_ON(1);
+ return -ENODEV;
+ }
+
usb_set_intfdata(intf, card);
ret = mwifiex_add_card(card, &card->fw_done, &usb_ops,
--
2.32.0
Add the missing endpoint max-packet sanity check to probe() to avoid
division by zero in ath10k_usb_hif_tx_sg() in case a malicious device
has broken descriptors (or when doing descriptor fuzz testing).
Note that USB core will reject URBs submitted for endpoints with zero
wMaxPacketSize but that drivers doing packet-size calculations still
need to handle this (cf. commit 2548288b4fb0 ("USB: Fix: Don't skip
endpoint descriptors with maxpacket=0")).
Fixes: 4db66499df91 ("ath10k: add initial USB support")
Cc: stable(a)vger.kernel.org # 4.14
Cc: Erik Stromdahl <erik.stromdahl(a)gmail.com>
Signed-off-by: Johan Hovold <johan(a)kernel.org>
---
drivers/net/wireless/ath/ath10k/usb.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/drivers/net/wireless/ath/ath10k/usb.c b/drivers/net/wireless/ath/ath10k/usb.c
index 6d831b098cbb..3d98f19c6ec8 100644
--- a/drivers/net/wireless/ath/ath10k/usb.c
+++ b/drivers/net/wireless/ath/ath10k/usb.c
@@ -853,6 +853,11 @@ static int ath10k_usb_setup_pipe_resources(struct ath10k *ar,
le16_to_cpu(endpoint->wMaxPacketSize),
endpoint->bInterval);
}
+
+ /* Ignore broken descriptors. */
+ if (usb_endpoint_maxp(endpoint) == 0)
+ continue;
+
urbcount = 0;
pipe_num =
--
2.32.0
The patch titled
Subject: ipc: WARN if trying to remove ipc object which is absent
has been added to the -mm tree. Its filename is
ipc-warn-if-trying-to-remove-ipc-object-which-is-absent.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/ipc-warn-if-trying-to-remove-ipc-…
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/ipc-warn-if-trying-to-remove-ipc-…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Alexander Mikhalitsyn <alexander.mikhalitsyn(a)virtuozzo.com>
Subject: ipc: WARN if trying to remove ipc object which is absent
Patch series "shm: shm_rmid_forced feature fixes".
Some time ago I met kernel crash after CRIU restore procedure,
fortunately, it was CRIU restore, so, I had dump files and could do
restore many times and crash reproduced easily. After some investigation
I've constructed the minimal reproducer. It was found that it's
use-after-free and it happens only if sysctl kernel.shm_rmid_forced = 1.
The key of the problem is that the exit_shm() function not handles shp's
object destroy when task->sysvshm.shm_clist contains items from different
IPC namespaces. In most cases this list will contain only items from one
IPC namespace.
Why this list may contain object from different namespaces? Function
exit_shm() designed to clean up this list always when process leaves IPC
namespace. But we made a mistake a long time ago and not add exit_shm()
call into setns() syscall procedures. 1st second idea was just to add
this call to setns() syscall but it's obviously changes semantics of
setns() syscall and that's userspace-visible change. So, I gave up this
idea.
First real attempt to address the issue was just to omit forced destroy if
we meet shp object not from current task IPC namespace [1]. But that was
not the best idea because task->sysvshm.shm_clist was protected by rwsem
which belongs to current task IPC namespace. It means that list
corruption may occur.
Second approach is just extend exit_shm() to properly handle shp's from
different IPC namespaces [2]. This is really non-trivial thing, I've put
a lot of effort into that but not believed that it's possible to make it
fully safe, clean and clear.
Thanks to the efforts of Manfred Spraul working an elegant solution was
designed. Thanks a lot, Manfred!
Eric also suggested the way to address the issue in ("[RFC][PATCH] shm: In
shm_exit destroy all created and never attached segments") Eric's idea was
to maintain a list of shm_clists one per IPC namespace, use lock-less
lists. But there is some extra memory consumption-related concerns.
Alternative solution which was suggested by me was implemented in ("shm:
reset shm_clist on setns but omit forced shm destroy") Idea is pretty
simple, we add exit_shm() syscall to setns() but DO NOT destroy shm
segments even if sysctl kernel.shm_rmid_forced = 1, we just clean up the
task->sysvshm.shm_clist list. This chages semantics of setns() syscall a
little bit but in comparision to "naive" solution when we just add
exit_shm() without any special exclusions this looks like a safer option.
[1] https://lkml.org/lkml/2021/7/6/1108
[2] https://lkml.org/lkml/2021/7/14/736
This patch (of 2):
Let's produce a warning if we trying to remove non-existing IPC object
from IPC namespace kht/idr structures.
This allows to catch possible bugs when ipc_rmid() function was called
with inconsistent struct ipc_ids*, struct kern_ipc_perm* arguments.
Link: https://lkml.kernel.org/r/20211027224348.611025-1-alexander.mikhalitsyn@vir…
Link: https://lkml.kernel.org/r/20211027224348.611025-2-alexander.mikhalitsyn@vir…
Co-developed-by: Manfred Spraul <manfred(a)colorfullife.com>
Signed-off-by: Manfred Spraul <manfred(a)colorfullife.com>
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn(a)virtuozzo.com>
Cc: "Eric W. Biederman" <ebiederm(a)xmission.com>
Cc: Davidlohr Bueso <dave(a)stgolabs.net>
Cc: Greg KH <gregkh(a)linuxfoundation.org>
Cc: Andrei Vagin <avagin(a)gmail.com>
Cc: Pavel Tikhomirov <ptikhomirov(a)virtuozzo.com>
Cc: Vasily Averin <vvs(a)virtuozzo.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
ipc/util.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
--- a/ipc/util.c~ipc-warn-if-trying-to-remove-ipc-object-which-is-absent
+++ a/ipc/util.c
@@ -447,8 +447,8 @@ static int ipcget_public(struct ipc_name
static void ipc_kht_remove(struct ipc_ids *ids, struct kern_ipc_perm *ipcp)
{
if (ipcp->key != IPC_PRIVATE)
- rhashtable_remove_fast(&ids->key_ht, &ipcp->khtnode,
- ipc_kht_params);
+ WARN_ON_ONCE(rhashtable_remove_fast(&ids->key_ht, &ipcp->khtnode,
+ ipc_kht_params));
}
/**
@@ -498,7 +498,7 @@ void ipc_rmid(struct ipc_ids *ids, struc
{
int idx = ipcid_to_idx(ipcp->id);
- idr_remove(&ids->ipcs_idr, idx);
+ WARN_ON_ONCE(idr_remove(&ids->ipcs_idr, idx) != ipcp);
ipc_kht_remove(ids, ipcp);
ids->in_use--;
ipcp->deleted = true;
_
Patches currently in -mm which might be from alexander.mikhalitsyn(a)virtuozzo.com are
ipc-warn-if-trying-to-remove-ipc-object-which-is-absent.patch
shm-extend-forced-shm-destroy-to-support-objects-from-several-ipc-nses.patch
The patch titled
Subject: memcg: prohibit unconditional exceeding the limit of dying tasks
has been added to the -mm tree. Its filename is
memcg-prohibit-unconditional-exceeding-the-limit-of-dying-tasks.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/memcg-prohibit-unconditional-exce…
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/memcg-prohibit-unconditional-exce…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Vasily Averin <vvs(a)virtuozzo.com>
Subject: memcg: prohibit unconditional exceeding the limit of dying tasks
Memory cgroup charging allows killed or exiting tasks to exceed the hard
limit. It is assumed that the amount of the memory charged by those tasks
is bound and most of the memory will get released while the task is
exiting. This is resembling a heuristic for the global OOM situation when
tasks get access to memory reserves. There is no global memory shortage
at the memcg level so the memcg heuristic is more relieved.
The above assumption is overly optimistic though. E.g. vmalloc can scale
to really large requests and the heuristic would allow that. We used to
have an early break in the vmalloc allocator for killed tasks but this has
been reverted by commit b8c8a338f75e ("Revert "vmalloc: back off when the
current task is killed""). There are likely other similar code paths
which do not check for fatal signals in an allocation&charge loop. Also
there are some kernel objects charged to a memcg which are not bound to a
process life time.
It has been observed that it is not really hard to trigger these bypasses
and cause global OOM situation.
One potential way to address these runaways would be to limit the amount
of excess (similar to the global OOM with limited oom reserves). This is
certainly possible but it is not really clear how much of an excess is
desirable and still protects from global OOMs as that would have to
consider the overall memcg configuration.
This patch is addressing the problem by removing the heuristic altogether.
Bypass is only allowed for requests which either cannot fail or where the
failure is not desirable while excess should be still limited (e.g.
atomic requests). Implementation wise a killed or dying task fails to
charge if it has passed the OOM killer stage. That should give all forms
of reclaim chance to restore the limit before the failure (ENOMEM) and
tell the caller to back off.
In addition, this patch renames should_force_charge() helper to
task_is_dying() because now its use is not associated witch forced
charging.
This patch depends on pagefault_out_of_memory() to not trigger
out_of_memory(), because then a memcg failure can unwind to VM_FAULT_OOM
and cause a global OOM killer.
Link: https://lkml.kernel.org/r/8f5cebbb-06da-4902-91f0-6566fc4b4203@virtuozzo.com
Signed-off-by: Vasily Averin <vvs(a)virtuozzo.com>
Suggested-by: Michal Hocko <mhocko(a)suse.com>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Vladimir Davydov <vdavydov.dev(a)gmail.com>
Cc: Roman Gushchin <guro(a)fb.com>
Cc: Uladzislau Rezki <urezki(a)gmail.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Shakeel Butt <shakeelb(a)google.com>
Cc: Mel Gorman <mgorman(a)techsingularity.net>
Cc: Tetsuo Handa <penguin-kernel(a)i-love.sakura.ne.jp>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memcontrol.c | 27 ++++++++-------------------
1 file changed, 8 insertions(+), 19 deletions(-)
--- a/mm/memcontrol.c~memcg-prohibit-unconditional-exceeding-the-limit-of-dying-tasks
+++ a/mm/memcontrol.c
@@ -234,7 +234,7 @@ enum res_type {
iter != NULL; \
iter = mem_cgroup_iter(NULL, iter, NULL))
-static inline bool should_force_charge(void)
+static inline bool task_is_dying(void)
{
return tsk_is_oom_victim(current) || fatal_signal_pending(current) ||
(current->flags & PF_EXITING);
@@ -1624,7 +1624,7 @@ static bool mem_cgroup_out_of_memory(str
* A few threads which were not waiting at mutex_lock_killable() can
* fail to bail out. Therefore, check again after holding oom_lock.
*/
- ret = should_force_charge() || out_of_memory(&oc);
+ ret = task_is_dying() || out_of_memory(&oc);
unlock:
mutex_unlock(&oom_lock);
@@ -2579,6 +2579,7 @@ static int try_charge_memcg(struct mem_c
struct page_counter *counter;
enum oom_status oom_status;
unsigned long nr_reclaimed;
+ bool passed_oom = false;
bool may_swap = true;
bool drained = false;
unsigned long pflags;
@@ -2614,15 +2615,6 @@ retry:
goto force;
/*
- * Unlike in global OOM situations, memcg is not in a physical
- * memory shortage. Allow dying and OOM-killed tasks to
- * bypass the last charges so that they can exit quickly and
- * free their memory.
- */
- if (unlikely(should_force_charge()))
- goto force;
-
- /*
* Prevent unbounded recursion when reclaim operations need to
* allocate memory. This might exceed the limits temporarily,
* but we prefer facilitating memory reclaim and getting back
@@ -2679,8 +2671,9 @@ retry:
if (gfp_mask & __GFP_RETRY_MAYFAIL)
goto nomem;
- if (fatal_signal_pending(current))
- goto force;
+ /* Avoid endless loop for tasks bypassed by the oom killer */
+ if (passed_oom && task_is_dying())
+ goto nomem;
/*
* keep retrying as long as the memcg oom killer is able to make
@@ -2689,14 +2682,10 @@ retry:
*/
oom_status = mem_cgroup_oom(mem_over_limit, gfp_mask,
get_order(nr_pages * PAGE_SIZE));
- switch (oom_status) {
- case OOM_SUCCESS:
+ if (oom_status == OOM_SUCCESS) {
+ passed_oom = true;
nr_retries = MAX_RECLAIM_RETRIES;
goto retry;
- case OOM_FAILED:
- goto force;
- default:
- goto nomem;
}
nomem:
if (!(gfp_mask & __GFP_NOFAIL))
_
Patches currently in -mm which might be from vvs(a)virtuozzo.com are
mm-oom-pagefault_out_of_memory-dont-force-global-oom-for-dying-tasks.patch
memcg-prohibit-unconditional-exceeding-the-limit-of-dying-tasks.patch
mm-vmalloc-repair-warn_allocs-in-__vmalloc_area_node.patch
vmalloc-back-off-when-the-current-task-is-oom-killed.patch
The patch titled
Subject: mm, oom: do not trigger out_of_memory from the #PF
has been added to the -mm tree. Its filename is
mm-oom-do-not-trigger-out_of_memory-from-the-pf.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/mm-oom-do-not-trigger-out_of_memo…
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/mm-oom-do-not-trigger-out_of_memo…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Michal Hocko <mhocko(a)suse.com>
Subject: mm, oom: do not trigger out_of_memory from the #PF
Any allocation failure during the #PF path will return with VM_FAULT_OOM
which in turn results in pagefault_out_of_memory. This can happen for 2
different reasons. a) Memcg is out of memory and we rely on
mem_cgroup_oom_synchronize to perform the memcg OOM handling or b) normal
allocation fails.
The latter is quite problematic because allocation paths already trigger
out_of_memory and the page allocator tries really hard to not fail
allocations. Anyway, if the OOM killer has been already invoked there is
no reason to invoke it again from the #PF path. Especially when the OOM
condition might be gone by that time and we have no way to find out other
than allocate.
Moreover if the allocation failed and the OOM killer hasn't been invoked
then we are unlikely to do the right thing from the #PF context because we
have already lost the allocation context and restictions and therefore
might oom kill a task from a different NUMA domain.
This all suggests that there is no legitimate reason to trigger
out_of_memory from pagefault_out_of_memory so drop it. Just to be sure
that no #PF path returns with VM_FAULT_OOM without allocation print a
warning that this is happening before we restart the #PF.
[VvS: #PF allocation can hit into limit of cgroup v1 kmem controller.
This is a local problem related to memcg, however, it causes unnecessary
global OOM kills that are repeated over and over again and escalate into a
real disaster. This has been broken since kmem accounting has been
introduced for cgroup v1 (3.8). There was no kmem specific reclaim for
the separate limit so the only way to handle kmem hard limit was to return
with ENOMEM. In upstream the problem will be fixed by removing the
outdated kmem limit, however stable and LTS kernels cannot do it and are
still affected. This patch fixes the problem and should be backported
into stable/LTS.]
Link: https://lkml.kernel.org/r/f5fd8dd8-0ad4-c524-5f65-920b01972a42@virtuozzo.com
Signed-off-by: Michal Hocko <mhocko(a)suse.com>
Signed-off-by: Vasily Averin <vvs(a)virtuozzo.com>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Mel Gorman <mgorman(a)techsingularity.net>
Cc: Roman Gushchin <guro(a)fb.com>
Cc: Shakeel Butt <shakeelb(a)google.com>
Cc: Tetsuo Handa <penguin-kernel(a)i-love.sakura.ne.jp>
Cc: Uladzislau Rezki <urezki(a)gmail.com>
Cc: Vladimir Davydov <vdavydov.dev(a)gmail.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/oom_kill.c | 22 ++++++++--------------
1 file changed, 8 insertions(+), 14 deletions(-)
--- a/mm/oom_kill.c~mm-oom-do-not-trigger-out_of_memory-from-the-pf
+++ a/mm/oom_kill.c
@@ -1120,19 +1120,15 @@ bool out_of_memory(struct oom_control *o
}
/*
- * The pagefault handler calls here because it is out of memory, so kill a
- * memory-hogging task. If oom_lock is held by somebody else, a parallel oom
- * killing is already in progress so do nothing.
+ * The pagefault handler calls here because some allocation has failed. We have
+ * to take care of the memcg OOM here because this is the only safe context without
+ * any locks held but let the oom killer triggered from the allocation context care
+ * about the global OOM.
*/
void pagefault_out_of_memory(void)
{
- struct oom_control oc = {
- .zonelist = NULL,
- .nodemask = NULL,
- .memcg = NULL,
- .gfp_mask = 0,
- .order = 0,
- };
+ static DEFINE_RATELIMIT_STATE(pfoom_rs, DEFAULT_RATELIMIT_INTERVAL,
+ DEFAULT_RATELIMIT_BURST);
if (mem_cgroup_oom_synchronize(true))
return;
@@ -1140,10 +1136,8 @@ void pagefault_out_of_memory(void)
if (fatal_signal_pending(current))
return;
- if (!mutex_trylock(&oom_lock))
- return;
- out_of_memory(&oc);
- mutex_unlock(&oom_lock);
+ if (__ratelimit(&pfoom_rs))
+ pr_warn("Huh VM_FAULT_OOM leaked out to the #PF handler. Retrying PF\n");
}
SYSCALL_DEFINE2(process_mrelease, int, pidfd, unsigned int, flags)
_
Patches currently in -mm which might be from mhocko(a)suse.com are
mm-oom-do-not-trigger-out_of_memory-from-the-pf.patch
mm-vmalloc-be-more-explicit-about-supported-gfp-flags.patch
The read-only THP for filesystems would collapse THP for file opened
readonly and mapped with VM_EXEC, the intended usecase is to avoid TLB
miss for large text segment. But it doesn't restrict the file types so
THP could be collapsed for non-regular file, for example, block device,
if it is opened readonly and mapped with EXEC permission. This may
cause bugs, like [1] and [2].
This is definitely not intended usecase, so just collapsing THP for regular
file in order to close the attack surface.
[1] https://lore.kernel.org/lkml/CACkBjsYwLYLRmX8GpsDpMthagWOjWWrNxqY6ZLNQVr6yx…
[2] https://lore.kernel.org/linux-mm/000000000000c6a82505ce284e4c@google.com/
Fixes: 99cb0dbd47a1 ("mm,thp: add read-only THP support for (non-shmem) FS")
Reported-by: Hao Sun <sunhao.th(a)gmail.com>
Reported-by: syzbot+aae069be1de40fb11825(a)syzkaller.appspotmail.com
Cc: Hao Sun <sunhao.th(a)gmail.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Cc: Song Liu <songliubraving(a)fb.com>
Cc: Andrea Righi <andrea.righi(a)canonical.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Hugh Dickins <hughd(a)google.com>
Signed-off-by: Yang Shi <shy828301(a)gmail.com>
---
The patch is basically based off the proposal from Hugh
(https://lore.kernel.org/linux-mm/a07564a3-b2fc-9ffe-3ace-3f276075ea5c@googl…).
It seems Hugh is too busy to prepare the patch for formal submission (I
didn't hear from him by pinging him a couple of times on mailing list),
so I prepared the patch and added his SOB.
mm/khugepaged.c | 17 ++++++++++-------
1 file changed, 10 insertions(+), 7 deletions(-)
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 045cc579f724..e91b7271275e 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -445,22 +445,25 @@ static bool hugepage_vma_check(struct vm_area_struct *vma,
if (!transhuge_vma_enabled(vma, vm_flags))
return false;
- /* Enabled via shmem mount options or sysfs settings. */
- if (shmem_file(vma->vm_file) && shmem_huge_enabled(vma)) {
+ if (vma->vm_file)
return IS_ALIGNED((vma->vm_start >> PAGE_SHIFT) - vma->vm_pgoff,
HPAGE_PMD_NR);
- }
+
+ /* Enabled via shmem mount options or sysfs settings. */
+ if (shmem_file(vma->vm_file))
+ return shmem_huge_enabled(vma);
/* THP settings require madvise. */
if (!(vm_flags & VM_HUGEPAGE) && !khugepaged_always())
return false;
- /* Read-only file mappings need to be aligned for THP to work. */
+ /* Only regular file is valid */
if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && vma->vm_file &&
- !inode_is_open_for_write(vma->vm_file->f_inode) &&
(vm_flags & VM_EXEC)) {
- return IS_ALIGNED((vma->vm_start >> PAGE_SHIFT) - vma->vm_pgoff,
- HPAGE_PMD_NR);
+ struct inode *inode = vma->vm_file->f_inode;
+
+ return !inode_is_open_for_write(inode) &&
+ S_ISREG(inode->i_mode);
}
if (!vma->anon_vma || vma->vm_ops)
--
2.26.2
The patch titled
Subject: mm, thp: fix incorrect unmap behavior for private pages
has been added to the -mm tree. Its filename is
mm-thp-fix-incorrect-unmap-behavior-for-private-pages.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/mm-thp-fix-incorrect-unmap-behavi…
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/mm-thp-fix-incorrect-unmap-behavi…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Rongwei Wang <rongwei.wang(a)linux.alibaba.com>
Subject: mm, thp: fix incorrect unmap behavior for private pages
When truncating pagecache on file THP, the private pages of a process
should not be unmapped mapping. This incorrect behavior on a dynamic
shared libraries which will cause related processes to happen core dump.
A simple test for a DSO (Prerequisite is the DSO mapped in file THP):
int main(int argc, char *argv[])
{
int fd;
fd = open(argv[1], O_WRONLY);
if (fd < 0) {
perror("open");
}
close(fd);
return 0;
}
The test only to open a target DSO, and do nothing. But this operation
will lead one or more process to happen core dump. This patch mainly to
fix this bug.
Link: https://lkml.kernel.org/r/20211025092134.18562-3-rongwei.wang@linux.alibaba…
Fixes: eb6ecbed0aa2 ("mm, thp: relax the VM_DENYWRITE constraint on file-backed THPs")
Signed-off-by: Rongwei Wang <rongwei.wang(a)linux.alibaba.com>
Tested-by: Xu Yu <xuyu(a)linux.alibaba.com>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: Song Liu <song(a)kernel.org>
Cc: William Kucharski <william.kucharski(a)oracle.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Yang Shi <shy828301(a)gmail.com>
Cc: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: Collin Fijalkovich <cfijalkovich(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/open.c | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)
--- a/fs/open.c~mm-thp-fix-incorrect-unmap-behavior-for-private-pages
+++ a/fs/open.c
@@ -857,8 +857,17 @@ static int do_dentry_open(struct file *f
*/
smp_mb();
if (filemap_nr_thps(inode->i_mapping)) {
+ struct address_space *mapping = inode->i_mapping;
+
filemap_invalidate_lock(inode->i_mapping);
- truncate_pagecache(inode, 0);
+ /*
+ * unmap_mapping_range just need to be called once
+ * here, because the private pages is not need to be
+ * unmapped mapping (e.g. data segment of dynamic
+ * shared libraries here).
+ */
+ unmap_mapping_range(mapping, 0, 0, 0);
+ truncate_inode_pages(mapping, 0);
filemap_invalidate_unlock(inode->i_mapping);
}
}
_
Patches currently in -mm which might be from rongwei.wang(a)linux.alibaba.com are
mm-thp-bail-out-early-in-collapse_file-for-writeback-page.patch
mm-thp-lock-filemap-when-truncating-page-cache.patch
mm-thp-fix-incorrect-unmap-behavior-for-private-pages.patch
mm-damon-dbgfs-remove-unnecessary-variables.patch
The patch titled
Subject: mm, thp: lock filemap when truncating page cache
has been added to the -mm tree. Its filename is
mm-thp-lock-filemap-when-truncating-page-cache.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/mm-thp-lock-filemap-when-truncati…
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/mm-thp-lock-filemap-when-truncati…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Rongwei Wang <rongwei.wang(a)linux.alibaba.com>
Subject: mm, thp: lock filemap when truncating page cache
Patch series "fix two bugs for file THP".
Transparent huge page has supported read-only non-shmem files. The file-
backed THP is collapsed by khugepaged and truncated when written (for
shared libraries).
However, there is a race when multiple writers truncate the same page
cache concurrently.
In that case, subpage(s) of file THP can be revealed by find_get_entry in
truncate_inode_pages_range, which will trigger PageTail BUG_ON in
truncate_inode_page, as follows.
page:000000009e420ff2 refcount:1 mapcount:0 mapping:0000000000000000 index:0x7ff pfn:0x50c3ff
head:0000000075ff816d order:9 compound_mapcount:0 compound_pincount:0
flags: 0x37fffe0000010815(locked|uptodate|lru|arch_1|head)
raw: 37fffe0000000000 fffffe0013108001 dead000000000122 dead000000000400
raw: 0000000000000001 0000000000000000 00000000ffffffff 0000000000000000
head: 37fffe0000010815 fffffe001066bd48 ffff000404183c20 0000000000000000
head: 0000000000000600 0000000000000000 00000001ffffffff ffff000c0345a000
page dumped because: VM_BUG_ON_PAGE(PageTail(page))
------------[ cut here ]------------
kernel BUG at mm/truncate.c:213!
Internal error: Oops - BUG: 0 [#1] SMP
Modules linked in: xfs(E) libcrc32c(E) rfkill(E) ...
CPU: 14 PID: 11394 Comm: check_madvise_d Kdump: ...
Hardware name: ECS, BIOS 0.0.0 02/06/2015
pstate: 60400005 (nZCv daif +PAN -UAO -TCO BTYPE=--)
pc : truncate_inode_page+0x64/0x70
lr : truncate_inode_page+0x64/0x70
sp : ffff80001b60b900
x29: ffff80001b60b900 x28: 00000000000007ff
x27: ffff80001b60b9a0 x26: 0000000000000000
x25: 000000000000000f x24: ffff80001b60b9a0
x23: ffff80001b60ba18 x22: ffff0001e0999ea8
x21: ffff0000c21db300 x20: ffffffffffffffff
x19: fffffe001310ffc0 x18: 0000000000000020
x17: 0000000000000000 x16: 0000000000000000
x15: ffff0000c21db960 x14: 3030306666666620
x13: 6666666666666666 x12: 3130303030303030
x11: ffff8000117b69b8 x10: 00000000ffff8000
x9 : ffff80001012690c x8 : 0000000000000000
x7 : ffff8000114f69b8 x6 : 0000000000017ffd
x5 : ffff0007fffbcbc8 x4 : ffff80001b60b5c0
x3 : 0000000000000001 x2 : 0000000000000000
x1 : 0000000000000000 x0 : 0000000000000000
Call trace:
truncate_inode_page+0x64/0x70
truncate_inode_pages_range+0x550/0x7e4
truncate_pagecache+0x58/0x80
do_dentry_open+0x1e4/0x3c0
vfs_open+0x38/0x44
do_open+0x1f0/0x310
path_openat+0x114/0x1dc
do_filp_open+0x84/0x134
do_sys_openat2+0xbc/0x164
__arm64_sys_openat+0x74/0xc0
el0_svc_common.constprop.0+0x88/0x220
do_el0_svc+0x30/0xa0
el0_svc+0x20/0x30
el0_sync_handler+0x1a4/0x1b0
el0_sync+0x180/0x1c0
Code: aa0103e0 900061e1 910ec021 9400d300 (d4210000)
---[ end trace f70cdb42cb7c2d42 ]---
Kernel panic - not syncing: Oops - BUG: Fatal exception
This patch mainly to lock filemap when one enter truncate_pagecache(),
avoiding truncating the same page cache concurrently.
Link: https://lkml.kernel.org/r/20211025092134.18562-2-rongwei.wang@linux.alibaba…
Fixes: eb6ecbed0aa2 ("mm, thp: relax the VM_DENYWRITE constraint on file-backed THPs")
Signed-off-by: Xu Yu <xuyu(a)linux.alibaba.com>
Signed-off-by: Rongwei Wang <rongwei.wang(a)linux.alibaba.com>
Suggested-by: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Tested-by: Song Liu <song(a)kernel.org>
Cc: Collin Fijalkovich <cfijalkovich(a)google.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: William Kucharski <william.kucharski(a)oracle.com>
Cc: Yang Shi <shy828301(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/open.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
--- a/fs/open.c~mm-thp-lock-filemap-when-truncating-page-cache
+++ a/fs/open.c
@@ -856,8 +856,11 @@ static int do_dentry_open(struct file *f
* of THPs into the page cache will fail.
*/
smp_mb();
- if (filemap_nr_thps(inode->i_mapping))
+ if (filemap_nr_thps(inode->i_mapping)) {
+ filemap_invalidate_lock(inode->i_mapping);
truncate_pagecache(inode, 0);
+ filemap_invalidate_unlock(inode->i_mapping);
+ }
}
return 0;
_
Patches currently in -mm which might be from rongwei.wang(a)linux.alibaba.com are
mm-thp-bail-out-early-in-collapse_file-for-writeback-page.patch
mm-thp-lock-filemap-when-truncating-page-cache.patch
mm-thp-fix-incorrect-unmap-behavior-for-private-pages.patch
mm-damon-dbgfs-remove-unnecessary-variables.patch
The patch titled
Subject: ocfs2: fix data corruption on truncate
has been added to the -mm tree. Its filename is
ocfs2-fix-data-corruption-on-truncate.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/ocfs2-fix-data-corruption-on-trun…
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/ocfs2-fix-data-corruption-on-trun…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Jan Kara <jack(a)suse.cz>
Subject: ocfs2: fix data corruption on truncate
Patch series "ocfs2: Truncate data corruption fix".
As further testing has shown, commit 5314454ea3f ("ocfs2: fix data
corruption after conversion from inline format") didn't fix all the data
corruption issues the customer started observing after 6dbf7bb55598 ("fs:
Don't invalidate page buffers in block_write_full_page()") This time I
have tracked them down to two bugs in ocfs2 truncation code.
One bug (truncating page cache before clearing tail cluster and setting
i_size) could cause data corruption even before 6dbf7bb55598, but before
that commit it needed a race with page fault, after 6dbf7bb55598 it
started to be pretty deterministic.
Another bug (zeroing pages beyond old i_size) used to be harmless
inefficiency before commit 6dbf7bb55598. But after commit 6dbf7bb55598 in
combination with the first bug it resulted in deterministic data
corruption.
Although fixing only the first problem is needed to stop data corruption,
I've fixed both issues to make the code more robust.
This patch (of 2):
ocfs2_truncate_file() did unmap invalidate page cache pages before zeroing
partial tail cluster and setting i_size. Thus some pages could be left
(and likely have left if the cluster zeroing happened) in the page cache
beyond i_size after truncate finished letting user possibly see stale data
once the file was extended again. Also the tail cluster zeroing was not
guaranteed to finish before truncate finished causing possible stale data
exposure. The problem started to be particularly easy to hit after commit
6dbf7bb55598 "fs: Don't invalidate page buffers in
block_write_full_page()" stopped invalidation of pages beyond i_size from
page writeback path.
Fix these problems by unmapping and invalidating pages in the page cache
after the i_size is reduced and tail cluster is zeroed out.
Link: https://lkml.kernel.org/r/20211025150008.29002-1-jack@suse.cz
Link: https://lkml.kernel.org/r/20211025151332.11301-1-jack@suse.cz
Fixes: ccd979bdbce9 ("[PATCH] OCFS2: The Second Oracle Cluster Filesystem")
Signed-off-by: Jan Kara <jack(a)suse.cz>
Cc: Mark Fasheh <mark(a)fasheh.com>
Cc: Joel Becker <jlbec(a)evilplan.org>
Cc: Junxiao Bi <junxiao.bi(a)oracle.com>
Cc: Joseph Qi <jiangqi903(a)gmail.com>
Cc: Changwei Ge <gechangwei(a)live.cn>
Cc: Gang He <ghe(a)suse.com>
Cc: Jun Piao <piaojun(a)huawei.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/ocfs2/file.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
--- a/fs/ocfs2/file.c~ocfs2-fix-data-corruption-on-truncate
+++ a/fs/ocfs2/file.c
@@ -476,10 +476,11 @@ int ocfs2_truncate_file(struct inode *in
* greater than page size, so we have to truncate them
* anyway.
*/
- unmap_mapping_range(inode->i_mapping, new_i_size + PAGE_SIZE - 1, 0, 1);
- truncate_inode_pages(inode->i_mapping, new_i_size);
if (OCFS2_I(inode)->ip_dyn_features & OCFS2_INLINE_DATA_FL) {
+ unmap_mapping_range(inode->i_mapping,
+ new_i_size + PAGE_SIZE - 1, 0, 1);
+ truncate_inode_pages(inode->i_mapping, new_i_size);
status = ocfs2_truncate_inline(inode, di_bh, new_i_size,
i_size_read(inode), 1);
if (status)
@@ -498,6 +499,9 @@ int ocfs2_truncate_file(struct inode *in
goto bail_unlock_sem;
}
+ unmap_mapping_range(inode->i_mapping, new_i_size + PAGE_SIZE - 1, 0, 1);
+ truncate_inode_pages(inode->i_mapping, new_i_size);
+
status = ocfs2_commit_truncate(osb, inode, di_bh);
if (status < 0) {
mlog_errno(status);
_
Patches currently in -mm which might be from jack(a)suse.cz are
ocfs2-fix-data-corruption-on-truncate.patch
ocfs2-do-not-zero-pages-beyond-i_size.patch
The patch titled
Subject: mm: khugepaged: skip huge page collapse for special files
has been added to the -mm tree. Its filename is
mm-khugepaged-skip-huge-page-collapse-for-special-files.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/mm-khugepaged-skip-huge-page-coll…
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/mm-khugepaged-skip-huge-page-coll…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Yang Shi <shy828301(a)gmail.com>
Subject: mm: khugepaged: skip huge page collapse for special files
The read-only THP for filesystems will collapse THP for files opened
readonly and mapped with VM_EXEC. The intended usecase is to avoid TLB
misses for large text segments. But it doesn't restrict the file types so
a THP could be collapsed for a non-regular file, for example, block
device, if it is opened readonly and mapped with EXEC permission. This
may cause bugs, like [1] and [2].
This is definitely not the intended usecase, so just collapse THP for
regular files in order to close the attack surface.
[1] https://lore.kernel.org/lkml/CACkBjsYwLYLRmX8GpsDpMthagWOjWWrNxqY6ZLNQVr6yx…
[2] https://lore.kernel.org/linux-mm/000000000000c6a82505ce284e4c@google.com/
Link: https://lkml.kernel.org/r/20211027195221.3825-1-shy828301@gmail.com
Fixes: 99cb0dbd47a1 ("mm,thp: add read-only THP support for (non-shmem) FS")
Signed-off-by: Hugh Dickins <hughd(a)google.com>
Signed-off-by: Yang Shi <shy828301(a)gmail.com>
Reported-by: Hao Sun <sunhao.th(a)gmail.com>
Reported-by: syzbot+aae069be1de40fb11825(a)syzkaller.appspotmail.com
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Cc: Song Liu <songliubraving(a)fb.com>
Cc: Andrea Righi <andrea.righi(a)canonical.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/khugepaged.c | 17 ++++++++++-------
1 file changed, 10 insertions(+), 7 deletions(-)
--- a/mm/khugepaged.c~mm-khugepaged-skip-huge-page-collapse-for-special-files
+++ a/mm/khugepaged.c
@@ -445,22 +445,25 @@ static bool hugepage_vma_check(struct vm
if (!transhuge_vma_enabled(vma, vm_flags))
return false;
- /* Enabled via shmem mount options or sysfs settings. */
- if (shmem_file(vma->vm_file) && shmem_huge_enabled(vma)) {
+ if (vma->vm_file)
return IS_ALIGNED((vma->vm_start >> PAGE_SHIFT) - vma->vm_pgoff,
HPAGE_PMD_NR);
- }
+
+ /* Enabled via shmem mount options or sysfs settings. */
+ if (shmem_file(vma->vm_file))
+ return shmem_huge_enabled(vma);
/* THP settings require madvise. */
if (!(vm_flags & VM_HUGEPAGE) && !khugepaged_always())
return false;
- /* Read-only file mappings need to be aligned for THP to work. */
+ /* Only regular file is valid */
if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && vma->vm_file &&
- !inode_is_open_for_write(vma->vm_file->f_inode) &&
(vm_flags & VM_EXEC)) {
- return IS_ALIGNED((vma->vm_start >> PAGE_SHIFT) - vma->vm_pgoff,
- HPAGE_PMD_NR);
+ struct inode *inode = vma->vm_file->f_inode;
+
+ return !inode_is_open_for_write(inode) &&
+ S_ISREG(inode->i_mode);
}
if (!vma->anon_vma || vma->vm_ops)
_
Patches currently in -mm which might be from shy828301(a)gmail.com are
mm-hwpoison-remove-the-unnecessary-thp-check.patch
mm-filemap-check-if-thp-has-hwpoisoned-subpage-for-pmd-page-fault.patch
mm-khugepaged-skip-huge-page-collapse-for-special-files.patch
mm-khugepaged-skip-huge-page-collapse-for-special-files-fix.patch
mm-filemap-coding-style-cleanup-for-filemap_map_pmd.patch
mm-hwpoison-refactor-refcount-check-handling.patch
mm-shmem-dont-truncate-page-if-memory-failure-happens.patch
mm-hwpoison-handle-non-anonymous-thp-correctly.patch
mm-migrate-make-demotion-knob-depend-on-migration.patch
The patch titled
Subject: mm: khugepaged: skip huge page collapse for special files
has been removed from the -mm tree. Its filename was
mm-khugepaged-skip-huge-page-collapse-for-special-files.patch
This patch was dropped because an updated version will be merged
------------------------------------------------------
From: Yang Shi <shy828301(a)gmail.com>
Subject: mm: khugepaged: skip huge page collapse for special files
The read-only THP for filesystems will collapse THP for files opened
readonly and mapped with VM_EXEC. The intended usecase is to avoid TLB
misses for large text segments. But it doesn't restrict the file types so
a THP could be collapsed for a non-regular file, for example, block
device, if it is opened readonly and mapped with EXEC permission. This
may cause bugs, like [1] and [2].
This is definitely not the intended usecase, so just collapse THP for
regular files in order to close the attack surface.
[1] https://lore.kernel.org/lkml/CACkBjsYwLYLRmX8GpsDpMthagWOjWWrNxqY6ZLNQVr6yx…
[2] https://lore.kernel.org/linux-mm/000000000000c6a82505ce284e4c@google.com/
Link: https://lkml.kernel.org/r/20211027195221.3825-1-shy828301@gmail.com
Fixes: 99cb0dbd47a1 ("mm,thp: add read-only THP support for (non-shmem) FS")
Signed-off-by: Hugh Dickins <hughd(a)google.com>
Signed-off-by: Yang Shi <shy828301(a)gmail.com>
Reported-by: Hao Sun <sunhao.th(a)gmail.com>
Reported-by: syzbot+aae069be1de40fb11825(a)syzkaller.appspotmail.com
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Cc: Song Liu <songliubraving(a)fb.com>
Cc: Andrea Righi <andrea.righi(a)canonical.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/khugepaged.c | 17 ++++++++++-------
1 file changed, 10 insertions(+), 7 deletions(-)
--- a/mm/khugepaged.c~mm-khugepaged-skip-huge-page-collapse-for-special-files
+++ a/mm/khugepaged.c
@@ -445,22 +445,25 @@ static bool hugepage_vma_check(struct vm
if (!transhuge_vma_enabled(vma, vm_flags))
return false;
- /* Enabled via shmem mount options or sysfs settings. */
- if (shmem_file(vma->vm_file) && shmem_huge_enabled(vma)) {
+ if (vma->vm_file)
return IS_ALIGNED((vma->vm_start >> PAGE_SHIFT) - vma->vm_pgoff,
HPAGE_PMD_NR);
- }
+
+ /* Enabled via shmem mount options or sysfs settings. */
+ if (shmem_file(vma->vm_file))
+ return shmem_huge_enabled(vma);
/* THP settings require madvise. */
if (!(vm_flags & VM_HUGEPAGE) && !khugepaged_always())
return false;
- /* Read-only file mappings need to be aligned for THP to work. */
+ /* Only regular file is valid */
if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && vma->vm_file &&
- !inode_is_open_for_write(vma->vm_file->f_inode) &&
(vm_flags & VM_EXEC)) {
- return IS_ALIGNED((vma->vm_start >> PAGE_SHIFT) - vma->vm_pgoff,
- HPAGE_PMD_NR);
+ struct inode *inode = vma->vm_file->f_inode;
+
+ return !inode_is_open_for_write(inode) &&
+ S_ISREG(inode->i_mode);
}
if (!vma->anon_vma || vma->vm_ops)
_
Patches currently in -mm which might be from shy828301(a)gmail.com are
mm-hwpoison-remove-the-unnecessary-thp-check.patch
mm-filemap-check-if-thp-has-hwpoisoned-subpage-for-pmd-page-fault.patch
mm-filemap-coding-style-cleanup-for-filemap_map_pmd.patch
mm-hwpoison-refactor-refcount-check-handling.patch
mm-shmem-dont-truncate-page-if-memory-failure-happens.patch
mm-hwpoison-handle-non-anonymous-thp-correctly.patch
mm-migrate-make-demotion-knob-depend-on-migration.patch
The patch titled
Subject: mm: khugepaged: skip huge page collapse for special files
has been added to the -mm tree. Its filename is
mm-khugepaged-skip-huge-page-collapse-for-special-files.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/mm-khugepaged-skip-huge-page-coll…
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/mm-khugepaged-skip-huge-page-coll…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Yang Shi <shy828301(a)gmail.com>
Subject: mm: khugepaged: skip huge page collapse for special files
The read-only THP for filesystems will collapse THP for files opened
readonly and mapped with VM_EXEC. The intended usecase is to avoid TLB
misses for large text segments. But it doesn't restrict the file types so
a THP could be collapsed for a non-regular file, for example, block
device, if it is opened readonly and mapped with EXEC permission. This
may cause bugs, like [1] and [2].
This is definitely not the intended usecase, so just collapse THP for
regular files in order to close the attack surface.
[1] https://lore.kernel.org/lkml/CACkBjsYwLYLRmX8GpsDpMthagWOjWWrNxqY6ZLNQVr6yx…
[2] https://lore.kernel.org/linux-mm/000000000000c6a82505ce284e4c@google.com/
Link: https://lkml.kernel.org/r/20211027195221.3825-1-shy828301@gmail.com
Fixes: 99cb0dbd47a1 ("mm,thp: add read-only THP support for (non-shmem) FS")
Signed-off-by: Hugh Dickins <hughd(a)google.com>
Signed-off-by: Yang Shi <shy828301(a)gmail.com>
Reported-by: Hao Sun <sunhao.th(a)gmail.com>
Reported-by: syzbot+aae069be1de40fb11825(a)syzkaller.appspotmail.com
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Cc: Song Liu <songliubraving(a)fb.com>
Cc: Andrea Righi <andrea.righi(a)canonical.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/khugepaged.c | 17 ++++++++++-------
1 file changed, 10 insertions(+), 7 deletions(-)
--- a/mm/khugepaged.c~mm-khugepaged-skip-huge-page-collapse-for-special-files
+++ a/mm/khugepaged.c
@@ -445,22 +445,25 @@ static bool hugepage_vma_check(struct vm
if (!transhuge_vma_enabled(vma, vm_flags))
return false;
- /* Enabled via shmem mount options or sysfs settings. */
- if (shmem_file(vma->vm_file) && shmem_huge_enabled(vma)) {
+ if (vma->vm_file)
return IS_ALIGNED((vma->vm_start >> PAGE_SHIFT) - vma->vm_pgoff,
HPAGE_PMD_NR);
- }
+
+ /* Enabled via shmem mount options or sysfs settings. */
+ if (shmem_file(vma->vm_file))
+ return shmem_huge_enabled(vma);
/* THP settings require madvise. */
if (!(vm_flags & VM_HUGEPAGE) && !khugepaged_always())
return false;
- /* Read-only file mappings need to be aligned for THP to work. */
+ /* Only regular file is valid */
if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && vma->vm_file &&
- !inode_is_open_for_write(vma->vm_file->f_inode) &&
(vm_flags & VM_EXEC)) {
- return IS_ALIGNED((vma->vm_start >> PAGE_SHIFT) - vma->vm_pgoff,
- HPAGE_PMD_NR);
+ struct inode *inode = vma->vm_file->f_inode;
+
+ return !inode_is_open_for_write(inode) &&
+ S_ISREG(inode->i_mode);
}
if (!vma->anon_vma || vma->vm_ops)
_
Patches currently in -mm which might be from shy828301(a)gmail.com are
mm-hwpoison-remove-the-unnecessary-thp-check.patch
mm-filemap-check-if-thp-has-hwpoisoned-subpage-for-pmd-page-fault.patch
mm-khugepaged-skip-huge-page-collapse-for-special-files.patch
mm-filemap-coding-style-cleanup-for-filemap_map_pmd.patch
mm-hwpoison-refactor-refcount-check-handling.patch
mm-shmem-dont-truncate-page-if-memory-failure-happens.patch
mm-hwpoison-handle-non-anonymous-thp-correctly.patch
mm-migrate-make-demotion-knob-depend-on-migration.patch
Add the missing endpoint max-packet sanity check to probe() to avoid
division by zero in ath10k_usb_hif_tx_sg() in case a malicious device
has broken descriptors (or when doing descriptor fuzz testing).
Note that USB core will reject URBs submitted for endpoints with zero
wMaxPacketSize but that drivers doing packet-size calculations still
need to handle this (cf. commit 2548288b4fb0 ("USB: Fix: Don't skip
endpoint descriptors with maxpacket=0")).
Fixes: 4db66499df91 ("ath10k: add initial USB support")
Cc: stable(a)vger.kernel.org # 4.14
Cc: Erik Stromdahl <erik.stromdahl(a)gmail.com>
Signed-off-by: Johan Hovold <johan(a)kernel.org>
---
drivers/net/wireless/ath/ath10k/usb.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/drivers/net/wireless/ath/ath10k/usb.c b/drivers/net/wireless/ath/ath10k/usb.c
index 6d831b098cbb..3d98f19c6ec8 100644
--- a/drivers/net/wireless/ath/ath10k/usb.c
+++ b/drivers/net/wireless/ath/ath10k/usb.c
@@ -853,6 +853,11 @@ static int ath10k_usb_setup_pipe_resources(struct ath10k *ar,
le16_to_cpu(endpoint->wMaxPacketSize),
endpoint->bInterval);
}
+
+ /* Ignore broken descriptors. */
+ if (usb_endpoint_maxp(endpoint) == 0)
+ continue;
+
urbcount = 0;
pipe_num =
--
2.32.0
When perf.data is not written cleanly, we would like to process existing
data as much as possible (please see f_header.data.size == 0 condition
in perf_session__read_header). However, perf.data with partial data may
crash perf. Specifically, we see crash in perf-script for NULL
session->header.env.arch.
Fix this by checking session->header.env.arch before using it to determine
native_arch. Also split the if condition so it is easier to read.
Cc: stable(a)vger.kernel.org
Signed-off-by: Song Liu <songliubraving(a)fb.com>
---
tools/perf/builtin-script.c | 13 +++++++++----
1 file changed, 9 insertions(+), 4 deletions(-)
diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index 6211d0b84b7a6..7821f6740ac1d 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -4039,12 +4039,17 @@ int cmd_script(int argc, const char **argv)
goto out_delete;
uname(&uts);
- if (data.is_pipe || /* assume pipe_mode indicates native_arch */
- !strcmp(uts.machine, session->header.env.arch) ||
- (!strcmp(uts.machine, "x86_64") &&
- !strcmp(session->header.env.arch, "i386")))
+ if (data.is_pipe) /* assume pipe_mode indicates native_arch */
native_arch = true;
+ if (session->header.env.arch) {
+ if (!strcmp(uts.machine, session->header.env.arch))
+ native_arch = true;
+ else if (!strcmp(uts.machine, "x86_64") &&
+ !strcmp(session->header.env.arch, "i386"))
+ native_arch = true;
+ }
+
script.session = session;
script__setup_sample_type(&script);
--
2.30.2
From: Pavel Begunkov <asml.silence(a)gmail.com>
[ Upstream commit 792bb6eb862333658bf1bd2260133f0507e2da8d ]
[ 97.866748] a.out/2890 is trying to acquire lock:
[ 97.867829] ffff8881046763e8 (&ctx->uring_lock){+.+.}-{3:3}, at:
io_wq_submit_work+0x155/0x240
[ 97.869735]
[ 97.869735] but task is already holding lock:
[ 97.871033] ffff88810dfe0be8 (&ctx->uring_lock){+.+.}-{3:3}, at:
__x64_sys_io_uring_enter+0x3f0/0x5b0
[ 97.873074]
[ 97.873074] other info that might help us debug this:
[ 97.874520] Possible unsafe locking scenario:
[ 97.874520]
[ 97.875845] CPU0
[ 97.876440] ----
[ 97.877048] lock(&ctx->uring_lock);
[ 97.877961] lock(&ctx->uring_lock);
[ 97.878881]
[ 97.878881] *** DEADLOCK ***
[ 97.878881]
[ 97.880341] May be due to missing lock nesting notation
[ 97.880341]
[ 97.881952] 1 lock held by a.out/2890:
[ 97.882873] #0: ffff88810dfe0be8 (&ctx->uring_lock){+.+.}-{3:3}, at:
__x64_sys_io_uring_enter+0x3f0/0x5b0
[ 97.885108]
[ 97.885108] stack backtrace:
[ 97.890457] Call Trace:
[ 97.891121] dump_stack+0xac/0xe3
[ 97.891972] __lock_acquire+0xab6/0x13a0
[ 97.892940] lock_acquire+0x2c3/0x390
[ 97.894894] __mutex_lock+0xae/0x9f0
[ 97.901101] io_wq_submit_work+0x155/0x240
[ 97.902112] io_wq_cancel_cb+0x162/0x490
[ 97.904126] io_async_find_and_cancel+0x3b/0x140
[ 97.905247] io_issue_sqe+0x86d/0x13e0
[ 97.909122] __io_queue_sqe+0x10b/0x550
[ 97.913971] io_queue_sqe+0x235/0x470
[ 97.914894] io_submit_sqes+0xcce/0xf10
[ 97.917872] __x64_sys_io_uring_enter+0x3fb/0x5b0
[ 97.921424] do_syscall_64+0x2d/0x40
[ 97.922329] entry_SYSCALL_64_after_hwframe+0x44/0xa9
While holding uring_lock, e.g. from inline execution, async cancel
request may attempt cancellations through io_wq_submit_work, which may
try to grab a lock. Delay it to task_work, so we do it from a clean
context and don't have to worry about locking.
Cc: <stable(a)vger.kernel.org> # 5.5+
Fixes: c07e6719511e ("io_uring: hold uring_lock while completing failed polled io in io_wq_submit_work()")
Reported-by: Abaci <abaci(a)linux.alibaba.com>
Reported-by: Hao Xu <haoxu(a)linux.alibaba.com>
Signed-off-by: Pavel Begunkov <asml.silence(a)gmail.com>
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
[Lee: The first hunk solves a different (double free) issue in v5.10.
Only the first hunk of the original patch is relevant to v5.10 AND
the first hunk of the original patch is only relevant to v5.10]
Reported-by: syzbot+59d8a1f4e60c20c066cf(a)syzkaller.appspotmail.com
Signed-off-by: Lee Jones <lee.jones(a)linaro.org>
---
fs/io_uring.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 26753d0cb4312..361f8ae96c36f 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -2075,7 +2075,9 @@ static void io_req_task_cancel(struct callback_head *cb)
struct io_kiocb *req = container_of(cb, struct io_kiocb, task_work);
struct io_ring_ctx *ctx = req->ctx;
+ mutex_lock(&ctx->uring_lock);
__io_req_task_cancel(req, -ECANCELED);
+ mutex_unlock(&ctx->uring_lock);
percpu_ref_put(&ctx->refs);
}
--
2.33.0.1079.g6e70778dc9-goog
If the guest requests string I/O from the hypervisor via VMGEXIT,
SW_EXITINFO2 will contain the REP count. However, sev_es_string_io
was incorrectly treating it as the size of the GHCB buffer in
bytes.
This fixes the "outsw" test in the experimental SEV tests of
kvm-unit-tests.
Cc: stable(a)vger.kernel.org
Fixes: 7ed9abfe8e9f ("KVM: SVM: Support string IO operations for an SEV-ES guest")
Reported-by: Marc Orr <marcorr(a)google.com>
Tested-by: Marc Orr <marcorr(a)google.com>
Reviewed-by: Marc Orr <marcorr(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
---
arch/x86/kvm/svm/sev.c | 15 ++++++++++++---
1 file changed, 12 insertions(+), 3 deletions(-)
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index e672493b5d8d..efd207fd335e 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -2579,11 +2579,20 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
int sev_es_string_io(struct vcpu_svm *svm, int size, unsigned int port, int in)
{
- if (!setup_vmgexit_scratch(svm, in, svm->vmcb->control.exit_info_2))
+ int count;
+ int bytes;
+
+ if (svm->vmcb->control.exit_info_2 > INT_MAX)
+ return -EINVAL;
+
+ count = svm->vmcb->control.exit_info_2;
+ if (unlikely(check_mul_overflow(count, size, &bytes)))
+ return -EINVAL;
+
+ if (!setup_vmgexit_scratch(svm, in, bytes))
return -EINVAL;
- return kvm_sev_es_string_io(&svm->vcpu, size, port,
- svm->ghcb_sa, svm->ghcb_sa_len / size, in);
+ return kvm_sev_es_string_io(&svm->vcpu, size, port, svm->ghcb_sa, count, in);
}
void sev_es_init_vmcb(struct vcpu_svm *svm)
--
2.27.0
This simply adds proper support for panel backlights that can be controlled
via VESA's backlight control protocol, but which also require that we
enable and disable the backlight via PWM instead of via the DPCD interface.
We also enable this by default, in order to fix some people's backlights
that were broken by not having this enabled.
For reference, backlights that require this and use VESA's backlight
interface tend to be laptops with hybrid GPUs, but this very well may
change in the future.
v4:
* Make sure that we call intel_backlight_level_to_pwm() in
intel_dp_aux_vesa_enable_backlight() - vsyrjala
Signed-off-by: Lyude Paul <lyude(a)redhat.com>
Link: https://gitlab.freedesktop.org/drm/intel/-/issues/3680
Fixes: fe7d52bccab6 ("drm/i915/dp: Don't use DPCD backlights that need PWM enable/disable")
Cc: <stable(a)vger.kernel.org> # v5.12+
---
.../drm/i915/display/intel_dp_aux_backlight.c | 27 ++++++++++++++-----
1 file changed, 21 insertions(+), 6 deletions(-)
diff --git a/drivers/gpu/drm/i915/display/intel_dp_aux_backlight.c b/drivers/gpu/drm/i915/display/intel_dp_aux_backlight.c
index 569d17b4d00f..f05b71c01b8e 100644
--- a/drivers/gpu/drm/i915/display/intel_dp_aux_backlight.c
+++ b/drivers/gpu/drm/i915/display/intel_dp_aux_backlight.c
@@ -293,6 +293,13 @@ intel_dp_aux_vesa_enable_backlight(const struct intel_crtc_state *crtc_state,
struct intel_panel *panel = &connector->panel;
struct intel_dp *intel_dp = enc_to_intel_dp(connector->encoder);
+ if (!panel->backlight.edp.vesa.info.aux_enable) {
+ u32 pwm_level = intel_backlight_invert_pwm_level(connector,
+ panel->backlight.pwm_level_max);
+
+ panel->backlight.pwm_funcs->enable(crtc_state, conn_state, pwm_level);
+ }
+
drm_edp_backlight_enable(&intel_dp->aux, &panel->backlight.edp.vesa.info, level);
}
@@ -304,6 +311,10 @@ static void intel_dp_aux_vesa_disable_backlight(const struct drm_connector_state
struct intel_dp *intel_dp = enc_to_intel_dp(connector->encoder);
drm_edp_backlight_disable(&intel_dp->aux, &panel->backlight.edp.vesa.info);
+
+ if (!panel->backlight.edp.vesa.info.aux_enable)
+ panel->backlight.pwm_funcs->disable(old_conn_state,
+ intel_backlight_invert_pwm_level(connector, 0));
}
static int intel_dp_aux_vesa_setup_backlight(struct intel_connector *connector, enum pipe pipe)
@@ -321,6 +332,15 @@ static int intel_dp_aux_vesa_setup_backlight(struct intel_connector *connector,
if (ret < 0)
return ret;
+ if (!panel->backlight.edp.vesa.info.aux_enable) {
+ ret = panel->backlight.pwm_funcs->setup(connector, pipe);
+ if (ret < 0) {
+ drm_err(&i915->drm,
+ "Failed to setup PWM backlight controls for eDP backlight: %d\n",
+ ret);
+ return ret;
+ }
+ }
panel->backlight.max = panel->backlight.edp.vesa.info.max;
panel->backlight.min = 0;
if (current_mode == DP_EDP_BACKLIGHT_CONTROL_MODE_DPCD) {
@@ -340,12 +360,7 @@ intel_dp_aux_supports_vesa_backlight(struct intel_connector *connector)
struct intel_dp *intel_dp = intel_attached_dp(connector);
struct drm_i915_private *i915 = dp_to_i915(intel_dp);
- /* TODO: We currently only support AUX only backlight configurations, not backlights which
- * require a mix of PWM and AUX controls to work. In the mean time, these machines typically
- * work just fine using normal PWM controls anyway.
- */
- if ((intel_dp->edp_dpcd[1] & DP_EDP_BACKLIGHT_AUX_ENABLE_CAP) &&
- drm_edp_backlight_supported(intel_dp->edp_dpcd)) {
+ if (drm_edp_backlight_supported(intel_dp->edp_dpcd)) {
drm_dbg_kms(&i915->drm, "AUX Backlight Control Supported!\n");
return true;
}
--
2.31.1
I'm announcing the release of the 4.14.253 kernel.
All users of the 4.14 kernel series must upgrade.
The updated 4.14.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-4.14.y
and can be browsed at the normal kernel.org git web browser:
https://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
thanks,
greg k-h
------------
Makefile | 2
arch/arm/Kconfig | 1
arch/arm/boot/dts/at91-sama5d27_som1_ek.dts | 1
arch/arm/boot/dts/spear3xx.dtsi | 2
arch/nios2/include/asm/irqflags.h | 4 -
arch/nios2/include/asm/registers.h | 2
arch/xtensa/platforms/xtfpga/setup.c | 12 ++-
drivers/isdn/capi/kcapi.c | 5 +
drivers/isdn/hardware/mISDN/netjet.c | 2
drivers/net/can/rcar/rcar_can.c | 20 +++--
drivers/net/can/sja1000/peak_pci.c | 9 +-
drivers/net/can/usb/peak_usb/pcan_usb_fd.c | 5 -
drivers/net/ethernet/stmicro/stmmac/dwmac-generic.c | 1
drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c | 8 ++
drivers/net/phy/mdio_bus.c | 1
drivers/platform/x86/intel_scu_ipc.c | 2
drivers/scsi/hosts.c | 3
fs/btrfs/file.c | 56 +--------------
fs/btrfs/tree-log.c | 47 ++++++++-----
fs/exec.c | 2
fs/nfsd/nfsctl.c | 5 +
fs/ocfs2/alloc.c | 46 +++---------
fs/ocfs2/super.c | 14 ++-
include/linux/elfcore.h | 2
kernel/trace/ftrace.c | 4 -
kernel/trace/trace.h | 64 +++++-------------
kernel/trace/trace_functions.c | 2
net/netfilter/Kconfig | 2
net/netfilter/ipvs/ip_vs_ctl.c | 5 +
net/nfc/nci/rsp.c | 2
sound/hda/hdac_controller.c | 5 -
sound/soc/soc-dapm.c | 13 ++-
sound/usb/quirks-table.h | 32 +++++++++
33 files changed, 186 insertions(+), 195 deletions(-)
Antoine Tenart (1):
netfilter: ipvs: make global sysctl readonly in non-init netns
Benjamin Coddington (1):
NFSD: Keep existing listeners on portlist error
Brendan Grieve (1):
ALSA: usb-audio: Provide quirk for Sennheiser GSP670 Headset
Dexuan Cui (1):
scsi: core: Fix shost->cmd_per_lun calculation in scsi_add_host_with_dma()
Eugen Hristev (1):
ARM: dts: at91: sama5d2_som1_ek: disable ISC node by default
Filipe Manana (1):
btrfs: deal with errors when checking if a dir entry exists during log replay
Greg Kroah-Hartman (1):
Linux 4.14.253
Guenter Roeck (1):
xtensa: xtfpga: Try software restart before simulating CPU reset
Herve Codina (2):
net: stmmac: add support for dwmac 3.40a
ARM: dts: spear3xx: Fix gmac node
Jan Kara (1):
ocfs2: fix data corruption after conversion from inline format
Josef Bacik (1):
btrfs: always wait on ordered extents at fsync time
Kai Vehmanen (1):
ALSA: hda: avoid write to STATESTS if controller is in reset
Lin Ma (1):
nfc: nci: fix the UAF of rf_conn_info object
Lukas Bulwahn (1):
elfcore: correct reference to CONFIG_UML
Matthew Wilcox (Oracle) (1):
vfs: check fd has read access in kernel_read_file_from_fd()
Max Filippov (1):
xtensa: xtfpga: use CONFIG_USE_OF instead of CONFIG_OF
Nick Desaulniers (1):
ARM: 9122/1: select HAVE_FUTEX_CMPXCHG
Prashant Malani (1):
platform/x86: intel_scu_ipc: Update timeout value in comment
Randy Dunlap (1):
NIOS2: irqflags: rename a redefined register name
Stephane Grosjean (1):
can: peak_usb: pcan_usb_fd_decode_status(): fix back to ERROR_ACTIVE state notification
Steven Rostedt (VMware) (1):
tracing: Have all levels of checks prevent recursion
Takashi Iwai (1):
ASoC: DAPM: Fix missing kctl change notifications
Valentin Vidic (1):
ocfs2: mount fails with buffer overflow in strlen
Vegard Nossum (1):
netfilter: Kconfig: use 'default y' instead of 'm' for bool config option
Xiaolong Huang (1):
isdn: cpai: check ctr->cnr to avoid array index out of bound
Yanfei Xu (1):
net: mdiobus: Fix memory leak in __mdiobus_register
Yoshihiro Shimoda (1):
can: rcar_can: fix suspend/resume
Zheyu Ma (2):
can: peak_pci: peak_pci_remove(): fix UAF
isdn: mISDN: Fix sleeping function called from invalid context
The previous commit fixed handling of incomplete packets but broke error
handling: offsetof returns an unsigned value (size_t), but when compared
against the signed return value, the return value is interpreted as if
it were unsigned, so negative return values are never less than the
offset.
To make the code easier to read, calculate the minimal packet length
once and separately, and assign it to a signed int variable to eliminate
unsigned math and the need for type casts. It then becomes immediately
obvious how the actual data length is calculated and why the return
value cannot be less than the minimal length.
Fixes: 22d65765f211 ("HID: u2fzero: ignore incomplete packets without data")
Fixes: 42337b9d4d95 ("HID: add driver for U2F Zero built-in LED and RNG")
Signed-off-by: Andrej Shadura <andrew.shadura(a)collabora.co.uk>
---
drivers/hid/hid-u2fzero.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/drivers/hid/hid-u2fzero.c b/drivers/hid/hid-u2fzero.c
index d70cd3d7f583..94f78ffb76d0 100644
--- a/drivers/hid/hid-u2fzero.c
+++ b/drivers/hid/hid-u2fzero.c
@@ -191,6 +191,8 @@ static int u2fzero_rng_read(struct hwrng *rng, void *data,
struct u2f_hid_msg resp;
int ret;
size_t actual_length;
+ /* valid packets must have a correct header */
+ int min_length = offsetof(struct u2f_hid_msg, init.data);
if (!dev->present) {
hid_dbg(dev->hdev, "device not present");
@@ -200,12 +202,12 @@ static int u2fzero_rng_read(struct hwrng *rng, void *data,
ret = u2fzero_recv(dev, &req, &resp);
/* ignore errors or packets without data */
- if (ret < offsetof(struct u2f_hid_msg, init.data))
+ if (ret < min_length)
return 0;
/* only take the minimum amount of data it is safe to take */
- actual_length = min3((size_t)ret - offsetof(struct u2f_hid_msg,
- init.data), U2F_HID_MSG_LEN(resp), max);
+ actual_length = min3((size_t)ret - min_length,
+ U2F_HID_MSG_LEN(resp), max);
memcpy(data, resp.init.data, actual_length);
--
2.33.0
Add the missing endpoint max-packet sanity check to probe() to avoid
division by zero in ath10k_usb_hif_tx_sg() in case a malicious device
has broken descriptors (or when doing descriptor fuzz testing).
Note that USB core will reject URBs submitted for endpoints with zero
wMaxPacketSize but that drivers doing packet-size calculations still
need to handle this (cf. commit 2548288b4fb0 ("USB: Fix: Don't skip
endpoint descriptors with maxpacket=0")).
Fixes: 9cbee358687e ("ath6kl: add full USB support")
Cc: stable(a)vger.kernel.org # 3.5
Signed-off-by: Johan Hovold <johan(a)kernel.org>
---
drivers/net/wireless/ath/ath6kl/usb.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/drivers/net/wireless/ath/ath6kl/usb.c b/drivers/net/wireless/ath/ath6kl/usb.c
index bd367b79a4d3..aba70f35e574 100644
--- a/drivers/net/wireless/ath/ath6kl/usb.c
+++ b/drivers/net/wireless/ath/ath6kl/usb.c
@@ -340,6 +340,11 @@ static int ath6kl_usb_setup_pipe_resources(struct ath6kl_usb *ar_usb)
le16_to_cpu(endpoint->wMaxPacketSize),
endpoint->bInterval);
}
+
+ /* Ignore broken descriptors. */
+ if (usb_endpoint_maxp(endpoint) == 0)
+ continue;
+
urbcount = 0;
pipe_num =
--
2.32.0