It is mandatory for a software to issue a reset upon modifying RGMII
Receive Timing Control and RGMII Transmit Timing Control bit fields of MAC
Specific Control register 2 (page 2, register 21) otherwise the changes
won't be perceived by the PHY (the same is applicable for a lot of other
registers). Not setting the RGMII delays on the platforms that imply it'
being done on the PHY side will consequently cause the traffic loss. We
discovered that the denoted soft-reset is missing in the
m88e1121_config_aneg() method for the case if the RGMII delays are
modified but the MDIx polarity isn't changed or the auto-negotiation is
left enabled, thus causing the traffic loss on our platform with Marvell
Alaska 88E1510 installed. Let's fix that by issuing the soft-reset if the
delays have been actually set in the m88e1121_config_aneg_rgmii_delays()
method.
Fixes: d6ab93364734 ("net: phy: marvell: Avoid unnecessary soft reset")
Signed-off-by: Pavel Parkhomenko <Pavel.Parkhomenko(a)baikalelectronics.ru>
Reviewed-by: Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
Reviewed-by: Serge Semin <Sergey.Semin(a)baikalelectronics.ru>
Cc: stable(a)vger.kernel.org
---
Link: https://lore.kernel.org/netdev/96759fee7240fd095cb9cc1f6eaf2d9113b57cf0.cam…
Changelog v2:
- Add "net" suffix into the PATCH-clause of the subject.
- Cc the patch to the stable tree list.
- Rebase onto the latset netdev/net branch with the top commit
---
drivers/net/phy/marvell.c | 21 ++++++++++++++-------
1 file changed, 14 insertions(+), 7 deletions(-)
diff --git a/drivers/net/phy/marvell.c b/drivers/net/phy/marvell.c
index fa71fb7a66b5..4b8f822f8c51 100644
--- a/drivers/net/phy/marvell.c
+++ b/drivers/net/phy/marvell.c
@@ -553,9 +553,9 @@ static int m88e1121_config_aneg_rgmii_delays(struct phy_device *phydev)
else
mscr = 0;
- return phy_modify_paged(phydev, MII_MARVELL_MSCR_PAGE,
- MII_88E1121_PHY_MSCR_REG,
- MII_88E1121_PHY_MSCR_DELAY_MASK, mscr);
+ return phy_modify_paged_changed(phydev, MII_MARVELL_MSCR_PAGE,
+ MII_88E1121_PHY_MSCR_REG,
+ MII_88E1121_PHY_MSCR_DELAY_MASK, mscr);
}
static int m88e1121_config_aneg(struct phy_device *phydev)
@@ -569,11 +569,13 @@ static int m88e1121_config_aneg(struct phy_device *phydev)
return err;
}
+ changed = err;
+
err = marvell_set_polarity(phydev, phydev->mdix_ctrl);
if (err < 0)
return err;
- changed = err;
+ changed |= err;
err = genphy_config_aneg(phydev);
if (err < 0)
@@ -1211,17 +1213,22 @@ static int m88e1510_config_init(struct phy_device *phydev)
static int m88e1118_config_aneg(struct phy_device *phydev)
{
+ int changed = 0;
int err;
- err = genphy_soft_reset(phydev);
+ err = marvell_set_polarity(phydev, phydev->mdix_ctrl);
if (err < 0)
return err;
- err = marvell_set_polarity(phydev, phydev->mdix_ctrl);
+ changed = err;
+
+ err = genphy_config_aneg(phydev);
if (err < 0)
return err;
- err = genphy_config_aneg(phydev);
+ if (phydev->autoneg != AUTONEG_ENABLE || changed)
+ return genphy_soft_reset(phydev);
+
return 0;
}
--
2.34.1
I'm announcing the release of the 5.16.7 kernel.
All users of the 5.16 kernel series must upgrade.
The updated 5.16.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-5.16.y
and can be browsed at the normal kernel.org git web browser:
https://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
thanks,
greg k-h
------------
Makefile | 2 +-
drivers/gpu/drm/vc4/vc4_hdmi.c | 23 +++++++----------------
2 files changed, 8 insertions(+), 17 deletions(-)
Greg Kroah-Hartman (3):
Revert "drm/vc4: hdmi: Make sure the device is powered with CEC"
Revert "drm/vc4: hdmi: Make sure the device is powered with CEC" again
Linux 5.16.7
I'm announcing the release of the 5.10.98 kernel.
All users of the 5.10 kernel series must upgrade.
The updated 5.10.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-5.10.y
and can be browsed at the normal kernel.org git web browser:
https://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
thanks,
greg k-h
------------
Makefile | 2 +-
drivers/gpu/drm/vc4/vc4_hdmi.c | 23 +++++++----------------
2 files changed, 8 insertions(+), 17 deletions(-)
Greg Kroah-Hartman (3):
Revert "drm/vc4: hdmi: Make sure the device is powered with CEC"
Revert "drm/vc4: hdmi: Make sure the device is powered with CEC" again
Linux 5.10.98
I'm announcing the release of the 5.15.21 kernel.
All users of the 5.15 kernel series must upgrade.
The updated 5.15.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-5.15.y
and can be browsed at the normal kernel.org git web browser:
https://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
thanks,
greg k-h
------------
Makefile | 2 +-
drivers/gpu/drm/vc4/vc4_hdmi.c | 23 +++++++----------------
2 files changed, 8 insertions(+), 17 deletions(-)
Greg Kroah-Hartman (3):
Revert "drm/vc4: hdmi: Make sure the device is powered with CEC"
Revert "drm/vc4: hdmi: Make sure the device is powered with CEC" again
Linux 5.15.21
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 3ec5586b4699cfb75cdfa09425e11d121db40773 Mon Sep 17 00:00:00 2001
From: Evan Quan <evan.quan(a)amd.com>
Date: Mon, 24 Jan 2022 13:40:35 +0800
Subject: [PATCH] drm/amd/pm: correct the MGpuFanBoost support for Beige Goby
The existing way cannot handle Beige Goby well as a different
PPTable data structure(PPTable_beige_goby_t instead of PPTable_t)
is used there.
Signed-off-by: Evan Quan <evan.quan(a)amd.com>
Acked-by: Alex Deucher <alexander.deucher(a)amd.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
Cc: stable(a)vger.kernel.org
diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c b/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c
index 777f717c37ae..a4207293158c 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c
@@ -3696,14 +3696,14 @@ static ssize_t sienna_cichlid_get_gpu_metrics(struct smu_context *smu,
static int sienna_cichlid_enable_mgpu_fan_boost(struct smu_context *smu)
{
- struct smu_table_context *table_context = &smu->smu_table;
- PPTable_t *smc_pptable = table_context->driver_pptable;
+ uint16_t *mgpu_fan_boost_limit_rpm;
+ GET_PPTABLE_MEMBER(MGpuFanBoostLimitRpm, &mgpu_fan_boost_limit_rpm);
/*
* Skip the MGpuFanBoost setting for those ASICs
* which do not support it
*/
- if (!smc_pptable->MGpuFanBoostLimitRpm)
+ if (*mgpu_fan_boost_limit_rpm == 0)
return 0;
return smu_cmn_send_smc_msg_with_param(smu,
This is the start of the stable review cycle for the 5.10.97 release.
There are 25 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Sun, 06 Feb 2022 09:19:05 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.10.97-rc…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.10.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 5.10.97-rc1
Eric Dumazet <edumazet(a)google.com>
tcp: add missing tcp_skb_can_collapse() test in tcp_shift_skb_data()
Eric Dumazet <edumazet(a)google.com>
af_packet: fix data-race in packet_setsockopt / packet_setsockopt
Tianchen Ding <dtcccc(a)linux.alibaba.com>
cpuset: Fix the bug that subpart_cpus updated wrongly in update_cpumask()
Eric Dumazet <edumazet(a)google.com>
rtnetlink: make sure to refresh master_dev/m_ops in __rtnl_newlink()
Eric Dumazet <edumazet(a)google.com>
net: sched: fix use-after-free in tc_new_tfilter()
Dan Carpenter <dan.carpenter(a)oracle.com>
fanotify: Fix stale file descriptor in copy_event_to_user()
Shyam Sundar S K <Shyam-sundar.S-k(a)amd.com>
net: amd-xgbe: Fix skb data length underflow
Raju Rangoju <Raju.Rangoju(a)amd.com>
net: amd-xgbe: ensure to reset the tx_timer_active flag
Georgi Valkov <gvalkov(a)abv.bg>
ipheth: fix EOVERFLOW in ipheth_rcvbulk_callback
Maor Dickman <maord(a)nvidia.com>
net/mlx5: E-Switch, Fix uninitialized variable modact
Maher Sanalla <msanalla(a)nvidia.com>
net/mlx5: Use del_timer_sync in fw reset flow of halting poll
Maor Dickman <maord(a)nvidia.com>
net/mlx5e: Fix handling of wrong devices during bond netevent
Eric W. Biederman <ebiederm(a)xmission.com>
cgroup-v1: Require capabilities to set release_agent
Maxime Ripard <maxime(a)cerno.tech>
drm/vc4: hdmi: Make sure the device is powered with CEC
Tony Luck <tony.luck(a)intel.com>
x86/cpu: Add Xeon Icelake-D to list of CPUs that support PPIN
Tony Luck <tony.luck(a)intel.com>
x86/mce: Add Xeon Sapphire Rapids to list of CPUs that support PPIN
Namhyung Kim <namhyung(a)kernel.org>
perf/core: Fix cgroup event list management
Peter Zijlstra <peterz(a)infradead.org>
perf: Rework perf_event_exit_event()
Suren Baghdasaryan <surenb(a)google.com>
psi: Fix uaf issue when psi trigger is destroyed while being polled
Sean Christopherson <seanjc(a)google.com>
KVM: x86: Forcibly leave nested virt when SMM state is toggled
Kevin Hilman <khilman(a)baylibre.com>
Revert "drivers: bus: simple-pm-bus: Add support for probing simple bus only devices"
Alex Elder <elder(a)linaro.org>
net: ipa: prevent concurrent replenish
Alex Elder <elder(a)linaro.org>
net: ipa: use a bitmap for endpoint replenish_enabled
Alex Elder <elder(a)linaro.org>
net: ipa: fix atomic update in ipa_endpoint_replenish()
Lukas Wunner <lukas(a)wunner.de>
PCI: pciehp: Fix infinite loop in IRQ handler upon power fault
-------------
Diffstat:
Documentation/accounting/psi.rst | 3 +-
Makefile | 4 +-
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/kernel/cpu/mce/intel.c | 2 +
arch/x86/kvm/svm/nested.c | 10 +-
arch/x86/kvm/svm/svm.c | 2 +-
arch/x86/kvm/svm/svm.h | 2 +-
arch/x86/kvm/vmx/nested.c | 1 +
arch/x86/kvm/x86.c | 2 +
drivers/bus/simple-pm-bus.c | 39 +-----
drivers/gpu/drm/vc4/vc4_hdmi.c | 25 ++--
drivers/net/ethernet/amd/xgbe/xgbe-drv.c | 14 +-
.../net/ethernet/mellanox/mlx5/core/en/rep/bond.c | 32 ++---
drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c | 2 +-
.../ethernet/mellanox/mlx5/core/lib/fs_chains.c | 2 +-
drivers/net/ipa/ipa_endpoint.c | 25 ++--
drivers/net/ipa/ipa_endpoint.h | 15 +-
drivers/net/usb/ipheth.c | 6 +-
drivers/pci/hotplug/pciehp_hpc.c | 7 +-
fs/notify/fanotify/fanotify_user.c | 6 +-
include/linux/perf_event.h | 1 +
include/linux/psi.h | 2 +-
include/linux/psi_types.h | 3 -
kernel/cgroup/cgroup-v1.c | 14 ++
kernel/cgroup/cgroup.c | 11 +-
kernel/cgroup/cpuset.c | 3 +-
kernel/events/core.c | 151 ++++++++++++---------
kernel/sched/psi.c | 66 ++++-----
net/core/rtnetlink.c | 6 +-
net/ipv4/tcp_input.c | 2 +
net/packet/af_packet.c | 8 +-
net/sched/cls_api.c | 11 +-
32 files changed, 264 insertions(+), 214 deletions(-)
This is the start of the stable review cycle for the 5.4.177 release.
There are 10 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Sun, 06 Feb 2022 09:19:05 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.4.177-rc…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.4.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 5.4.177-rc1
Eric Dumazet <edumazet(a)google.com>
af_packet: fix data-race in packet_setsockopt / packet_setsockopt
Tianchen Ding <dtcccc(a)linux.alibaba.com>
cpuset: Fix the bug that subpart_cpus updated wrongly in update_cpumask()
Eric Dumazet <edumazet(a)google.com>
rtnetlink: make sure to refresh master_dev/m_ops in __rtnl_newlink()
Eric Dumazet <edumazet(a)google.com>
net: sched: fix use-after-free in tc_new_tfilter()
Shyam Sundar S K <Shyam-sundar.S-k(a)amd.com>
net: amd-xgbe: Fix skb data length underflow
Raju Rangoju <Raju.Rangoju(a)amd.com>
net: amd-xgbe: ensure to reset the tx_timer_active flag
Georgi Valkov <gvalkov(a)abv.bg>
ipheth: fix EOVERFLOW in ipheth_rcvbulk_callback
Eric W. Biederman <ebiederm(a)xmission.com>
cgroup-v1: Require capabilities to set release_agent
Suren Baghdasaryan <surenb(a)google.com>
psi: Fix uaf issue when psi trigger is destroyed while being polled
Lukas Wunner <lukas(a)wunner.de>
PCI: pciehp: Fix infinite loop in IRQ handler upon power fault
-------------
Diffstat:
Documentation/accounting/psi.rst | 3 +-
Makefile | 4 +-
drivers/net/ethernet/amd/xgbe/xgbe-drv.c | 14 ++++++-
drivers/net/usb/ipheth.c | 6 +--
drivers/pci/hotplug/pciehp_hpc.c | 7 ++--
include/linux/psi.h | 2 +-
include/linux/psi_types.h | 3 --
kernel/cgroup/cgroup-v1.c | 14 +++++++
kernel/cgroup/cgroup.c | 11 ++++--
kernel/cgroup/cpuset.c | 3 +-
kernel/sched/psi.c | 66 ++++++++++++++------------------
net/core/rtnetlink.c | 6 ++-
net/packet/af_packet.c | 8 +++-
net/sched/cls_api.c | 11 ++++--
14 files changed, 94 insertions(+), 64 deletions(-)
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From bca52455a3c07922ee976714b00563a13a29ab15 Mon Sep 17 00:00:00 2001
From: Lang Yu <Lang.Yu(a)amd.com>
Date: Fri, 28 Jan 2022 18:24:53 +0800
Subject: [PATCH] drm/amdgpu: fix a potential GPU hang on cyan skillfish
We observed a GPU hang when querying GMC CG state(i.e.,
cat amdgpu_pm_info) on cyan skillfish. Acctually, cyan
skillfish doesn't support any CG features.
Just prevent it from accessing GMC CG registers.
Signed-off-by: Lang Yu <Lang.Yu(a)amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar(a)amd.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
Cc: stable(a)vger.kernel.org
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
index 38bb42727715..a2f8ed0e6a64 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
@@ -1140,6 +1140,9 @@ static void gmc_v10_0_get_clockgating_state(void *handle, u32 *flags)
{
struct amdgpu_device *adev = (struct amdgpu_device *)handle;
+ if (adev->ip_versions[GC_HWIP][0] == IP_VERSION(10, 1, 3))
+ return;
+
adev->mmhub.funcs->get_clockgating(adev, flags);
if (adev->ip_versions[ATHUB_HWIP][0] >= IP_VERSION(2, 1, 0))
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From bca52455a3c07922ee976714b00563a13a29ab15 Mon Sep 17 00:00:00 2001
From: Lang Yu <Lang.Yu(a)amd.com>
Date: Fri, 28 Jan 2022 18:24:53 +0800
Subject: [PATCH] drm/amdgpu: fix a potential GPU hang on cyan skillfish
We observed a GPU hang when querying GMC CG state(i.e.,
cat amdgpu_pm_info) on cyan skillfish. Acctually, cyan
skillfish doesn't support any CG features.
Just prevent it from accessing GMC CG registers.
Signed-off-by: Lang Yu <Lang.Yu(a)amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar(a)amd.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
Cc: stable(a)vger.kernel.org
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
index 38bb42727715..a2f8ed0e6a64 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
@@ -1140,6 +1140,9 @@ static void gmc_v10_0_get_clockgating_state(void *handle, u32 *flags)
{
struct amdgpu_device *adev = (struct amdgpu_device *)handle;
+ if (adev->ip_versions[GC_HWIP][0] == IP_VERSION(10, 1, 3))
+ return;
+
adev->mmhub.funcs->get_clockgating(adev, flags);
if (adev->ip_versions[ATHUB_HWIP][0] >= IP_VERSION(2, 1, 0))
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 2d192fc4c1abeb0d04d1c8cd54405ff4a0b0255b Mon Sep 17 00:00:00 2001
From: Qu Wenruo <wqu(a)suse.com>
Date: Thu, 16 Dec 2021 19:47:35 +0800
Subject: [PATCH] btrfs: don't start transaction for scrub if the fs is mounted
read-only
[BUG]
The following super simple script would crash btrfs at unmount time, if
CONFIG_BTRFS_ASSERT() is set.
mkfs.btrfs -f $dev
mount $dev $mnt
xfs_io -f -c "pwrite 0 4k" $mnt/file
umount $mnt
mount -r ro $dev $mnt
btrfs scrub start -Br $mnt
umount $mnt
This will trigger the following ASSERT() introduced by commit
0a31daa4b602 ("btrfs: add assertion for empty list of transactions at
late stage of umount").
That patch is definitely not the cause, it just makes enough noise for
developers.
[CAUSE]
We will start transaction for the following call chain during scrub:
scrub_enumerate_chunks()
|- btrfs_inc_block_group_ro()
|- btrfs_join_transaction()
However for RO mount, there is no running transaction at all, thus
btrfs_join_transaction() will start a new transaction.
Furthermore, since it's read-only mount, btrfs_sync_fs() will not call
btrfs_commit_super() to commit the new but empty transaction.
And leads to the ASSERT().
The bug has been there for a long time. Only the new ASSERT() makes it
noisy enough to be noticed.
[FIX]
For read-only scrub on read-only mount, there is no need to start a
transaction nor to allocate new chunks in btrfs_inc_block_group_ro().
Just do extra read-only mount check in btrfs_inc_block_group_ro(), and
if it's read-only, skip all chunk allocation and go inc_block_group_ro()
directly.
CC: stable(a)vger.kernel.org # 5.4+
Signed-off-by: Qu Wenruo <wqu(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index 1db24e6d6d90..68feabc83a27 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -2544,6 +2544,19 @@ int btrfs_inc_block_group_ro(struct btrfs_block_group *cache,
int ret;
bool dirty_bg_running;
+ /*
+ * This can only happen when we are doing read-only scrub on read-only
+ * mount.
+ * In that case we should not start a new transaction on read-only fs.
+ * Thus here we skip all chunk allocations.
+ */
+ if (sb_rdonly(fs_info->sb)) {
+ mutex_lock(&fs_info->ro_block_group_mutex);
+ ret = inc_block_group_ro(cache, 0);
+ mutex_unlock(&fs_info->ro_block_group_mutex);
+ return ret;
+ }
+
do {
trans = btrfs_join_transaction(root);
if (IS_ERR(trans))
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 2d192fc4c1abeb0d04d1c8cd54405ff4a0b0255b Mon Sep 17 00:00:00 2001
From: Qu Wenruo <wqu(a)suse.com>
Date: Thu, 16 Dec 2021 19:47:35 +0800
Subject: [PATCH] btrfs: don't start transaction for scrub if the fs is mounted
read-only
[BUG]
The following super simple script would crash btrfs at unmount time, if
CONFIG_BTRFS_ASSERT() is set.
mkfs.btrfs -f $dev
mount $dev $mnt
xfs_io -f -c "pwrite 0 4k" $mnt/file
umount $mnt
mount -r ro $dev $mnt
btrfs scrub start -Br $mnt
umount $mnt
This will trigger the following ASSERT() introduced by commit
0a31daa4b602 ("btrfs: add assertion for empty list of transactions at
late stage of umount").
That patch is definitely not the cause, it just makes enough noise for
developers.
[CAUSE]
We will start transaction for the following call chain during scrub:
scrub_enumerate_chunks()
|- btrfs_inc_block_group_ro()
|- btrfs_join_transaction()
However for RO mount, there is no running transaction at all, thus
btrfs_join_transaction() will start a new transaction.
Furthermore, since it's read-only mount, btrfs_sync_fs() will not call
btrfs_commit_super() to commit the new but empty transaction.
And leads to the ASSERT().
The bug has been there for a long time. Only the new ASSERT() makes it
noisy enough to be noticed.
[FIX]
For read-only scrub on read-only mount, there is no need to start a
transaction nor to allocate new chunks in btrfs_inc_block_group_ro().
Just do extra read-only mount check in btrfs_inc_block_group_ro(), and
if it's read-only, skip all chunk allocation and go inc_block_group_ro()
directly.
CC: stable(a)vger.kernel.org # 5.4+
Signed-off-by: Qu Wenruo <wqu(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index 1db24e6d6d90..68feabc83a27 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -2544,6 +2544,19 @@ int btrfs_inc_block_group_ro(struct btrfs_block_group *cache,
int ret;
bool dirty_bg_running;
+ /*
+ * This can only happen when we are doing read-only scrub on read-only
+ * mount.
+ * In that case we should not start a new transaction on read-only fs.
+ * Thus here we skip all chunk allocations.
+ */
+ if (sb_rdonly(fs_info->sb)) {
+ mutex_lock(&fs_info->ro_block_group_mutex);
+ ret = inc_block_group_ro(cache, 0);
+ mutex_unlock(&fs_info->ro_block_group_mutex);
+ return ret;
+ }
+
do {
trans = btrfs_join_transaction(root);
if (IS_ERR(trans))
commit 8e9eacad7ec7a9cbf262649ebf1fa6e6f6cc7d82 upstream.
The upstream commit had to handle a lookup_by_id variable that is only
present in 5.17. This version of the patch removes that variable, so the
__lookup_addr() function only handles the lookup as it is implemented in
5.15 and 5.16. It also removes one 'const' keyword to prevent a warning
due to differing const-ness in the 5.17 version of addresses_equal().
The MPTCP endpoint list is under RCU protection, guarded by the
pernet spinlock. mptcp_nl_cmd_set_flags() traverses the list
without acquiring the spin-lock nor under the RCU critical section.
This change addresses the issue performing the lookup and the endpoint
update under the pernet spinlock.
Cc: <stable(a)vger.kernel.org> # 5.15.x
Cc: <stable(a)vger.kernel.org> # 5.16.x
Fixes: 0f9f696a502e ("mptcp: add set_flags command in PM netlink")
Acked-by: Paolo Abeni <pabeni(a)redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau(a)linux.intel.com>
---
net/mptcp/pm_netlink.c | 34 +++++++++++++++++++++++++---------
1 file changed, 25 insertions(+), 9 deletions(-)
diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c
index 65764c8171b3..5d305fafd0e9 100644
--- a/net/mptcp/pm_netlink.c
+++ b/net/mptcp/pm_netlink.c
@@ -459,6 +459,18 @@ static unsigned int fill_remote_addresses_vec(struct mptcp_sock *msk, bool fullm
return i;
}
+static struct mptcp_pm_addr_entry *
+__lookup_addr(struct pm_nl_pernet *pernet, struct mptcp_addr_info *info)
+{
+ struct mptcp_pm_addr_entry *entry;
+
+ list_for_each_entry(entry, &pernet->local_addr_list, list) {
+ if (addresses_equal(&entry->addr, info, true))
+ return entry;
+ }
+ return NULL;
+}
+
static void mptcp_pm_create_subflow_or_signal_addr(struct mptcp_sock *msk)
{
struct sock *sk = (struct sock *)msk;
@@ -1725,17 +1737,21 @@ static int mptcp_nl_cmd_set_flags(struct sk_buff *skb, struct genl_info *info)
if (addr.flags & MPTCP_PM_ADDR_FLAG_BACKUP)
bkup = 1;
- list_for_each_entry(entry, &pernet->local_addr_list, list) {
- if (addresses_equal(&entry->addr, &addr.addr, true)) {
- mptcp_nl_addr_backup(net, &entry->addr, bkup);
-
- if (bkup)
- entry->flags |= MPTCP_PM_ADDR_FLAG_BACKUP;
- else
- entry->flags &= ~MPTCP_PM_ADDR_FLAG_BACKUP;
- }
+ spin_lock_bh(&pernet->lock);
+ entry = __lookup_addr(pernet, &addr.addr);
+ if (!entry) {
+ spin_unlock_bh(&pernet->lock);
+ return -EINVAL;
}
+ if (bkup)
+ entry->flags |= MPTCP_PM_ADDR_FLAG_BACKUP;
+ else
+ entry->flags &= ~MPTCP_PM_ADDR_FLAG_BACKUP;
+ addr = *entry;
+ spin_unlock_bh(&pernet->lock);
+
+ mptcp_nl_addr_backup(net, &addr.addr, bkup);
return 0;
}
--
2.35.1
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 28b21c558a3753171097193b6f6602a94169093a Mon Sep 17 00:00:00 2001
From: Filipe Manana <fdmanana(a)suse.com>
Date: Fri, 21 Jan 2022 15:44:39 +0000
Subject: [PATCH] btrfs: fix use-after-free after failure to create a snapshot
At ioctl.c:create_snapshot(), we allocate a pending snapshot structure and
then attach it to the transaction's list of pending snapshots. After that
we call btrfs_commit_transaction(), and if that returns an error we jump
to 'fail' label, where we kfree() the pending snapshot structure. This can
result in a later use-after-free of the pending snapshot:
1) We allocated the pending snapshot and added it to the transaction's
list of pending snapshots;
2) We call btrfs_commit_transaction(), and it fails either at the first
call to btrfs_run_delayed_refs() or btrfs_start_dirty_block_groups().
In both cases, we don't abort the transaction and we release our
transaction handle. We jump to the 'fail' label and free the pending
snapshot structure. We return with the pending snapshot still in the
transaction's list;
3) Another task commits the transaction. This time there's no error at
all, and then during the transaction commit it accesses a pointer
to the pending snapshot structure that the snapshot creation task
has already freed, resulting in a user-after-free.
This issue could actually be detected by smatch, which produced the
following warning:
fs/btrfs/ioctl.c:843 create_snapshot() warn: '&pending_snapshot->list' not removed from list
So fix this by not having the snapshot creation ioctl directly add the
pending snapshot to the transaction's list. Instead add the pending
snapshot to the transaction handle, and then at btrfs_commit_transaction()
we add the snapshot to the list only when we can guarantee that any error
returned after that point will result in a transaction abort, in which
case the ioctl code can safely free the pending snapshot and no one can
access it anymore.
CC: stable(a)vger.kernel.org # 5.10+
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index eef5b300b9a9..90c11ddff6e5 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -805,10 +805,7 @@ static int create_snapshot(struct btrfs_root *root, struct inode *dir,
goto fail;
}
- spin_lock(&fs_info->trans_lock);
- list_add(&pending_snapshot->list,
- &trans->transaction->pending_snapshots);
- spin_unlock(&fs_info->trans_lock);
+ trans->pending_snapshot = pending_snapshot;
ret = btrfs_commit_transaction(trans);
if (ret)
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 03de89b45f27..c43bbc7f623e 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -2000,6 +2000,27 @@ static inline void btrfs_wait_delalloc_flush(struct btrfs_fs_info *fs_info)
btrfs_wait_ordered_roots(fs_info, U64_MAX, 0, (u64)-1);
}
+/*
+ * Add a pending snapshot associated with the given transaction handle to the
+ * respective handle. This must be called after the transaction commit started
+ * and while holding fs_info->trans_lock.
+ * This serves to guarantee a caller of btrfs_commit_transaction() that it can
+ * safely free the pending snapshot pointer in case btrfs_commit_transaction()
+ * returns an error.
+ */
+static void add_pending_snapshot(struct btrfs_trans_handle *trans)
+{
+ struct btrfs_transaction *cur_trans = trans->transaction;
+
+ if (!trans->pending_snapshot)
+ return;
+
+ lockdep_assert_held(&trans->fs_info->trans_lock);
+ ASSERT(cur_trans->state >= TRANS_STATE_COMMIT_START);
+
+ list_add(&trans->pending_snapshot->list, &cur_trans->pending_snapshots);
+}
+
int btrfs_commit_transaction(struct btrfs_trans_handle *trans)
{
struct btrfs_fs_info *fs_info = trans->fs_info;
@@ -2073,6 +2094,8 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans)
if (cur_trans->state >= TRANS_STATE_COMMIT_START) {
enum btrfs_trans_state want_state = TRANS_STATE_COMPLETED;
+ add_pending_snapshot(trans);
+
spin_unlock(&fs_info->trans_lock);
refcount_inc(&cur_trans->use_count);
@@ -2163,6 +2186,7 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans)
* COMMIT_DOING so make sure to wait for num_writers to == 1 again.
*/
spin_lock(&fs_info->trans_lock);
+ add_pending_snapshot(trans);
cur_trans->state = TRANS_STATE_COMMIT_DOING;
spin_unlock(&fs_info->trans_lock);
wait_event(cur_trans->writer_wait,
diff --git a/fs/btrfs/transaction.h b/fs/btrfs/transaction.h
index 1852ed9de7fd..9402d8d94484 100644
--- a/fs/btrfs/transaction.h
+++ b/fs/btrfs/transaction.h
@@ -123,6 +123,8 @@ struct btrfs_trans_handle {
struct btrfs_transaction *transaction;
struct btrfs_block_rsv *block_rsv;
struct btrfs_block_rsv *orig_rsv;
+ /* Set by a task that wants to create a snapshot. */
+ struct btrfs_pending_snapshot *pending_snapshot;
refcount_t use_count;
unsigned int type;
/*
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 23e3404de1aecc62c14ac96d4b63403c3e0f52d5 Mon Sep 17 00:00:00 2001
From: Kunihiko Hayashi <hayashi.kunihiko(a)socionext.com>
Date: Wed, 22 Dec 2021 13:48:12 +0900
Subject: [PATCH] spi: uniphier: Fix a bug that doesn't point to private data
correctly
In uniphier_spi_remove(), there is a wrong code to get private data from
the platform device, so the driver can't be removed properly.
The driver should get spi_master from the platform device and retrieve
the private data from it.
Cc: <stable(a)vger.kernel.org>
Fixes: 5ba155a4d4cc ("spi: add SPI controller driver for UniPhier SoC")
Signed-off-by: Kunihiko Hayashi <hayashi.kunihiko(a)socionext.com>
Link: https://lore.kernel.org/r/1640148492-32178-1-git-send-email-hayashi.kunihik…
Signed-off-by: Mark Brown <broonie(a)kernel.org>
diff --git a/drivers/spi/spi-uniphier.c b/drivers/spi/spi-uniphier.c
index 8900e51e1a1c..342ee8d2c476 100644
--- a/drivers/spi/spi-uniphier.c
+++ b/drivers/spi/spi-uniphier.c
@@ -767,12 +767,13 @@ static int uniphier_spi_probe(struct platform_device *pdev)
static int uniphier_spi_remove(struct platform_device *pdev)
{
- struct uniphier_spi_priv *priv = platform_get_drvdata(pdev);
+ struct spi_master *master = platform_get_drvdata(pdev);
+ struct uniphier_spi_priv *priv = spi_master_get_devdata(master);
- if (priv->master->dma_tx)
- dma_release_channel(priv->master->dma_tx);
- if (priv->master->dma_rx)
- dma_release_channel(priv->master->dma_rx);
+ if (master->dma_tx)
+ dma_release_channel(master->dma_tx);
+ if (master->dma_rx)
+ dma_release_channel(master->dma_rx);
clk_disable_unprepare(priv->clk);
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 23e3404de1aecc62c14ac96d4b63403c3e0f52d5 Mon Sep 17 00:00:00 2001
From: Kunihiko Hayashi <hayashi.kunihiko(a)socionext.com>
Date: Wed, 22 Dec 2021 13:48:12 +0900
Subject: [PATCH] spi: uniphier: Fix a bug that doesn't point to private data
correctly
In uniphier_spi_remove(), there is a wrong code to get private data from
the platform device, so the driver can't be removed properly.
The driver should get spi_master from the platform device and retrieve
the private data from it.
Cc: <stable(a)vger.kernel.org>
Fixes: 5ba155a4d4cc ("spi: add SPI controller driver for UniPhier SoC")
Signed-off-by: Kunihiko Hayashi <hayashi.kunihiko(a)socionext.com>
Link: https://lore.kernel.org/r/1640148492-32178-1-git-send-email-hayashi.kunihik…
Signed-off-by: Mark Brown <broonie(a)kernel.org>
diff --git a/drivers/spi/spi-uniphier.c b/drivers/spi/spi-uniphier.c
index 8900e51e1a1c..342ee8d2c476 100644
--- a/drivers/spi/spi-uniphier.c
+++ b/drivers/spi/spi-uniphier.c
@@ -767,12 +767,13 @@ static int uniphier_spi_probe(struct platform_device *pdev)
static int uniphier_spi_remove(struct platform_device *pdev)
{
- struct uniphier_spi_priv *priv = platform_get_drvdata(pdev);
+ struct spi_master *master = platform_get_drvdata(pdev);
+ struct uniphier_spi_priv *priv = spi_master_get_devdata(master);
- if (priv->master->dma_tx)
- dma_release_channel(priv->master->dma_tx);
- if (priv->master->dma_rx)
- dma_release_channel(priv->master->dma_rx);
+ if (master->dma_tx)
+ dma_release_channel(master->dma_tx);
+ if (master->dma_rx)
+ dma_release_channel(master->dma_rx);
clk_disable_unprepare(priv->clk);
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 23e3404de1aecc62c14ac96d4b63403c3e0f52d5 Mon Sep 17 00:00:00 2001
From: Kunihiko Hayashi <hayashi.kunihiko(a)socionext.com>
Date: Wed, 22 Dec 2021 13:48:12 +0900
Subject: [PATCH] spi: uniphier: Fix a bug that doesn't point to private data
correctly
In uniphier_spi_remove(), there is a wrong code to get private data from
the platform device, so the driver can't be removed properly.
The driver should get spi_master from the platform device and retrieve
the private data from it.
Cc: <stable(a)vger.kernel.org>
Fixes: 5ba155a4d4cc ("spi: add SPI controller driver for UniPhier SoC")
Signed-off-by: Kunihiko Hayashi <hayashi.kunihiko(a)socionext.com>
Link: https://lore.kernel.org/r/1640148492-32178-1-git-send-email-hayashi.kunihik…
Signed-off-by: Mark Brown <broonie(a)kernel.org>
diff --git a/drivers/spi/spi-uniphier.c b/drivers/spi/spi-uniphier.c
index 8900e51e1a1c..342ee8d2c476 100644
--- a/drivers/spi/spi-uniphier.c
+++ b/drivers/spi/spi-uniphier.c
@@ -767,12 +767,13 @@ static int uniphier_spi_probe(struct platform_device *pdev)
static int uniphier_spi_remove(struct platform_device *pdev)
{
- struct uniphier_spi_priv *priv = platform_get_drvdata(pdev);
+ struct spi_master *master = platform_get_drvdata(pdev);
+ struct uniphier_spi_priv *priv = spi_master_get_devdata(master);
- if (priv->master->dma_tx)
- dma_release_channel(priv->master->dma_tx);
- if (priv->master->dma_rx)
- dma_release_channel(priv->master->dma_rx);
+ if (master->dma_tx)
+ dma_release_channel(master->dma_tx);
+ if (master->dma_rx)
+ dma_release_channel(master->dma_rx);
clk_disable_unprepare(priv->clk);
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 23e3404de1aecc62c14ac96d4b63403c3e0f52d5 Mon Sep 17 00:00:00 2001
From: Kunihiko Hayashi <hayashi.kunihiko(a)socionext.com>
Date: Wed, 22 Dec 2021 13:48:12 +0900
Subject: [PATCH] spi: uniphier: Fix a bug that doesn't point to private data
correctly
In uniphier_spi_remove(), there is a wrong code to get private data from
the platform device, so the driver can't be removed properly.
The driver should get spi_master from the platform device and retrieve
the private data from it.
Cc: <stable(a)vger.kernel.org>
Fixes: 5ba155a4d4cc ("spi: add SPI controller driver for UniPhier SoC")
Signed-off-by: Kunihiko Hayashi <hayashi.kunihiko(a)socionext.com>
Link: https://lore.kernel.org/r/1640148492-32178-1-git-send-email-hayashi.kunihik…
Signed-off-by: Mark Brown <broonie(a)kernel.org>
diff --git a/drivers/spi/spi-uniphier.c b/drivers/spi/spi-uniphier.c
index 8900e51e1a1c..342ee8d2c476 100644
--- a/drivers/spi/spi-uniphier.c
+++ b/drivers/spi/spi-uniphier.c
@@ -767,12 +767,13 @@ static int uniphier_spi_probe(struct platform_device *pdev)
static int uniphier_spi_remove(struct platform_device *pdev)
{
- struct uniphier_spi_priv *priv = platform_get_drvdata(pdev);
+ struct spi_master *master = platform_get_drvdata(pdev);
+ struct uniphier_spi_priv *priv = spi_master_get_devdata(master);
- if (priv->master->dma_tx)
- dma_release_channel(priv->master->dma_tx);
- if (priv->master->dma_rx)
- dma_release_channel(priv->master->dma_rx);
+ if (master->dma_tx)
+ dma_release_channel(master->dma_tx);
+ if (master->dma_rx)
+ dma_release_channel(master->dma_rx);
clk_disable_unprepare(priv->clk);
The patch below does not apply to the 5.16-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 23e3404de1aecc62c14ac96d4b63403c3e0f52d5 Mon Sep 17 00:00:00 2001
From: Kunihiko Hayashi <hayashi.kunihiko(a)socionext.com>
Date: Wed, 22 Dec 2021 13:48:12 +0900
Subject: [PATCH] spi: uniphier: Fix a bug that doesn't point to private data
correctly
In uniphier_spi_remove(), there is a wrong code to get private data from
the platform device, so the driver can't be removed properly.
The driver should get spi_master from the platform device and retrieve
the private data from it.
Cc: <stable(a)vger.kernel.org>
Fixes: 5ba155a4d4cc ("spi: add SPI controller driver for UniPhier SoC")
Signed-off-by: Kunihiko Hayashi <hayashi.kunihiko(a)socionext.com>
Link: https://lore.kernel.org/r/1640148492-32178-1-git-send-email-hayashi.kunihik…
Signed-off-by: Mark Brown <broonie(a)kernel.org>
diff --git a/drivers/spi/spi-uniphier.c b/drivers/spi/spi-uniphier.c
index 8900e51e1a1c..342ee8d2c476 100644
--- a/drivers/spi/spi-uniphier.c
+++ b/drivers/spi/spi-uniphier.c
@@ -767,12 +767,13 @@ static int uniphier_spi_probe(struct platform_device *pdev)
static int uniphier_spi_remove(struct platform_device *pdev)
{
- struct uniphier_spi_priv *priv = platform_get_drvdata(pdev);
+ struct spi_master *master = platform_get_drvdata(pdev);
+ struct uniphier_spi_priv *priv = spi_master_get_devdata(master);
- if (priv->master->dma_tx)
- dma_release_channel(priv->master->dma_tx);
- if (priv->master->dma_rx)
- dma_release_channel(priv->master->dma_rx);
+ if (master->dma_tx)
+ dma_release_channel(master->dma_tx);
+ if (master->dma_rx)
+ dma_release_channel(master->dma_rx);
clk_disable_unprepare(priv->clk);
Upstream commit id in Linux 5.17-rc:
99510e1afb4863a225207146bd988064c5fd0629 ("drm/i915: Disable DSB usage
for now").
I'd like to nominate that patch for application to the 5.15-stable
tree. It applies trivially after dropping the ...
.has_pxp = 1, \
... line of context. That line of context was introduced by the
unrelated feature introduced later by
6f8e203897144e59de00ed910982af3d7c3e4a7f ("drm/i915/pxp: enable PXP
for integrated Gen12"), so can be safely dropped.
Disabling use of the DSB for GAMMA_LUT updates should fix corrupted
display colors on Intel Tigerlake, Rocketlake, DG-1 and Alderlake-S
Generation 12 graphics. Good explanation is in the upstream commit,
but for reference here the bug report which led to the bug diagnosis
and fix:
https://gitlab.freedesktop.org/drm/intel/-/issues/3916
This would make high color precision and HDR display modes usable on
Gen 12 graphics with Linux 5.51-stable.
Thanks for consideration,
-mario
I'm announcing the release of the 5.15.20 kernel.
All users of the 5.15 kernel series must upgrade.
The updated 5.15.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-5.15.y
and can be browsed at the normal kernel.org git web browser:
https://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
thanks,
greg k-h
------------
Makefile | 2
drivers/gpu/drm/vc4/vc4_hdmi.c | 25 +++----
drivers/net/ethernet/amd/xgbe/xgbe-drv.c | 14 +++-
drivers/net/ethernet/intel/e1000e/netdev.c | 6 +
drivers/net/ethernet/intel/i40e/i40e.h | 1
drivers/net/ethernet/intel/i40e/i40e_main.c | 31 ++++++++
drivers/net/ethernet/mellanox/mlx5/core/en/qos.c | 3
drivers/net/ethernet/mellanox/mlx5/core/en/rep/bond.c | 32 ++++-----
drivers/net/ethernet/mellanox/mlx5/core/en/rep/bridge.c | 6 +
drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_rxtx.c | 13 +++
drivers/net/ethernet/mellanox/mlx5/core/esw/bridge.c | 4 +
drivers/net/ethernet/mellanox/mlx5/core/esw/diag/bridge_tracepoint.h | 2
drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c | 2
drivers/net/ethernet/mellanox/mlx5/core/lib/fs_chains.c | 9 +-
drivers/net/ethernet/mellanox/mlx5/core/port.c | 9 +-
drivers/net/ipa/ipa_endpoint.c | 21 ++++--
drivers/net/ipa/ipa_endpoint.h | 17 ++++
drivers/net/usb/ipheth.c | 6 -
drivers/pci/hotplug/pciehp_hpc.c | 7 +-
fs/lockd/svcsubs.c | 18 ++---
fs/notify/fanotify/fanotify_user.c | 6 -
fs/overlayfs/copy_up.c | 16 +++-
kernel/cgroup/cgroup-v1.c | 14 ++++
kernel/cgroup/cpuset.c | 3
mm/gup.c | 35 ++++++++--
net/core/rtnetlink.c | 6 +
net/ipv4/tcp_input.c | 2
net/packet/af_packet.c | 8 +-
net/sched/cls_api.c | 11 ++-
tools/testing/selftests/net/mptcp/mptcp_join.sh | 5 -
30 files changed, 240 insertions(+), 94 deletions(-)
Alex Elder (2):
net: ipa: use a bitmap for endpoint replenish_enabled
net: ipa: prevent concurrent replenish
Christoph Fritz (1):
ovl: fix NULL pointer dereference in copy up warning
Dan Carpenter (1):
fanotify: Fix stale file descriptor in copy_event_to_user()
Dima Chumak (1):
net/mlx5: Fix offloading with ESWITCH_IPV4_TTL_MODIFY_ENABLE
Eric Dumazet (4):
net: sched: fix use-after-free in tc_new_tfilter()
rtnetlink: make sure to refresh master_dev/m_ops in __rtnl_newlink()
af_packet: fix data-race in packet_setsockopt / packet_setsockopt
tcp: add missing tcp_skb_can_collapse() test in tcp_shift_skb_data()
Eric W. Biederman (1):
cgroup-v1: Require capabilities to set release_agent
Gal Pressman (1):
net/mlx5e: Fix module EEPROM query
Georgi Valkov (1):
ipheth: fix EOVERFLOW in ipheth_rcvbulk_callback
Greg Kroah-Hartman (1):
Linux 5.15.20
J. Bruce Fields (2):
lockd: fix server crash on reboot of client holding lock
lockd: fix failure to cleanup client locks
Jedrzej Jagielski (1):
i40e: Fix reset bw limit when DCB enabled with 1 TC
John Hubbard (1):
Revert "mm/gup: small refactoring: simplify try_grab_page()"
Karen Sornek (1):
i40e: Fix reset path while removing the driver
Lukas Wunner (1):
PCI: pciehp: Fix infinite loop in IRQ handler upon power fault
Maher Sanalla (1):
net/mlx5: Use del_timer_sync in fw reset flow of halting poll
Maor Dickman (2):
net/mlx5e: Fix handling of wrong devices during bond netevent
net/mlx5: E-Switch, Fix uninitialized variable modact
Maxim Mikityanskiy (1):
net/mlx5e: Don't treat small ceil values as unlimited in HTB offload
Maxime Ripard (1):
drm/vc4: hdmi: Make sure the device is powered with CEC
Miklos Szeredi (1):
ovl: don't fail copy up if no fileattr support on upper
Paolo Abeni (1):
selftests: mptcp: fix ipv6 routing setup
Raed Salem (1):
net/mlx5e: IPsec: Fix tunnel mode crypto offload for non TCP/UDP traffic
Raju Rangoju (1):
net: amd-xgbe: ensure to reset the tx_timer_active flag
Roi Dayan (1):
net/mlx5: Bridge, Fix devlink deadlock on net namespace deletion
Sasha Neftin (1):
e1000e: Handshake with CSME starts from ADL platforms
Shyam Sundar S K (1):
net: amd-xgbe: Fix skb data length underflow
Tianchen Ding (1):
cpuset: Fix the bug that subpart_cpus updated wrongly in update_cpumask()
Vlad Buslov (2):
net/mlx5: Bridge, take rtnl lock in init error handler
net/mlx5: Bridge, ensure dev_name is null-terminated
I'm announcing the release of the 5.10.97 kernel.
All users of the 5.10 kernel series must upgrade.
The updated 5.10.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-5.10.y
and can be browsed at the normal kernel.org git web browser:
https://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
thanks,
greg k-h
------------
Documentation/accounting/psi.rst | 3
Makefile | 2
arch/x86/include/asm/kvm_host.h | 1
arch/x86/kernel/cpu/mce/intel.c | 2
arch/x86/kvm/svm/nested.c | 10 +-
arch/x86/kvm/svm/svm.c | 2
arch/x86/kvm/svm/svm.h | 2
arch/x86/kvm/vmx/nested.c | 1
arch/x86/kvm/x86.c | 2
drivers/bus/simple-pm-bus.c | 39 ---------
drivers/gpu/drm/vc4/vc4_hdmi.c | 25 +++---
drivers/net/ethernet/amd/xgbe/xgbe-drv.c | 14 +++
drivers/net/ethernet/mellanox/mlx5/core/en/rep/bond.c | 32 +++----
drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c | 2
drivers/net/ethernet/mellanox/mlx5/core/lib/fs_chains.c | 2
drivers/net/ipa/ipa_endpoint.c | 25 ++++--
drivers/net/ipa/ipa_endpoint.h | 15 +++
drivers/net/usb/ipheth.c | 6 -
drivers/pci/hotplug/pciehp_hpc.c | 7 -
fs/notify/fanotify/fanotify_user.c | 6 -
include/linux/psi.h | 2
include/linux/psi_types.h | 3
kernel/cgroup/cgroup-v1.c | 14 +++
kernel/cgroup/cgroup.c | 11 +-
kernel/cgroup/cpuset.c | 3
kernel/sched/psi.c | 66 +++++++---------
net/core/rtnetlink.c | 6 -
net/ipv4/tcp_input.c | 2
net/packet/af_packet.c | 8 +
net/sched/cls_api.c | 11 +-
30 files changed, 175 insertions(+), 149 deletions(-)
Alex Elder (3):
net: ipa: fix atomic update in ipa_endpoint_replenish()
net: ipa: use a bitmap for endpoint replenish_enabled
net: ipa: prevent concurrent replenish
Dan Carpenter (1):
fanotify: Fix stale file descriptor in copy_event_to_user()
Eric Dumazet (4):
net: sched: fix use-after-free in tc_new_tfilter()
rtnetlink: make sure to refresh master_dev/m_ops in __rtnl_newlink()
af_packet: fix data-race in packet_setsockopt / packet_setsockopt
tcp: add missing tcp_skb_can_collapse() test in tcp_shift_skb_data()
Eric W. Biederman (1):
cgroup-v1: Require capabilities to set release_agent
Georgi Valkov (1):
ipheth: fix EOVERFLOW in ipheth_rcvbulk_callback
Greg Kroah-Hartman (1):
Linux 5.10.97
Kevin Hilman (1):
Revert "drivers: bus: simple-pm-bus: Add support for probing simple bus only devices"
Lukas Wunner (1):
PCI: pciehp: Fix infinite loop in IRQ handler upon power fault
Maher Sanalla (1):
net/mlx5: Use del_timer_sync in fw reset flow of halting poll
Maor Dickman (2):
net/mlx5e: Fix handling of wrong devices during bond netevent
net/mlx5: E-Switch, Fix uninitialized variable modact
Maxime Ripard (1):
drm/vc4: hdmi: Make sure the device is powered with CEC
Raju Rangoju (1):
net: amd-xgbe: ensure to reset the tx_timer_active flag
Sean Christopherson (1):
KVM: x86: Forcibly leave nested virt when SMM state is toggled
Shyam Sundar S K (1):
net: amd-xgbe: Fix skb data length underflow
Suren Baghdasaryan (1):
psi: Fix uaf issue when psi trigger is destroyed while being polled
Tianchen Ding (1):
cpuset: Fix the bug that subpart_cpus updated wrongly in update_cpumask()
Tony Luck (2):
x86/mce: Add Xeon Sapphire Rapids to list of CPUs that support PPIN
x86/cpu: Add Xeon Icelake-D to list of CPUs that support PPIN
I'm announcing the release of the 5.4.177 kernel.
All users of the 5.4 kernel series must upgrade.
The updated 5.4.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-5.4.y
and can be browsed at the normal kernel.org git web browser:
https://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
thanks,
greg k-h
------------
Documentation/accounting/psi.rst | 3 -
Makefile | 2
drivers/net/ethernet/amd/xgbe/xgbe-drv.c | 14 ++++++
drivers/net/usb/ipheth.c | 6 +-
drivers/pci/hotplug/pciehp_hpc.c | 7 +--
include/linux/psi.h | 2
include/linux/psi_types.h | 3 -
kernel/cgroup/cgroup-v1.c | 14 ++++++
kernel/cgroup/cgroup.c | 11 +++--
kernel/cgroup/cpuset.c | 3 -
kernel/sched/psi.c | 66 +++++++++++++------------------
net/core/rtnetlink.c | 6 +-
net/packet/af_packet.c | 8 ++-
net/sched/cls_api.c | 11 +++--
14 files changed, 93 insertions(+), 63 deletions(-)
Eric Dumazet (3):
net: sched: fix use-after-free in tc_new_tfilter()
rtnetlink: make sure to refresh master_dev/m_ops in __rtnl_newlink()
af_packet: fix data-race in packet_setsockopt / packet_setsockopt
Eric W. Biederman (1):
cgroup-v1: Require capabilities to set release_agent
Georgi Valkov (1):
ipheth: fix EOVERFLOW in ipheth_rcvbulk_callback
Greg Kroah-Hartman (1):
Linux 5.4.177
Lukas Wunner (1):
PCI: pciehp: Fix infinite loop in IRQ handler upon power fault
Raju Rangoju (1):
net: amd-xgbe: ensure to reset the tx_timer_active flag
Shyam Sundar S K (1):
net: amd-xgbe: Fix skb data length underflow
Suren Baghdasaryan (1):
psi: Fix uaf issue when psi trigger is destroyed while being polled
Tianchen Ding (1):
cpuset: Fix the bug that subpart_cpus updated wrongly in update_cpumask()
On Fri, Feb 04, 2022 at 11:28:19AM -0600, Tabitha Sable wrote:
> I'm happy to help with this, but I'm not familiar with the conventions for
> sending in backport diffs.
>
> I've read through the docs on kernelnewbies and found them to be both
> overwhelmingly big and also not directly relevant to this particular
> situation. I think if I simply try to follow them I'll foul things up.
Yeah, it's not the same thing at all.
> Can I simply make the changes against the appropriate git branch, build and
> test, and then email in the diff to stable(a)vger.kernel.org copying most of
> what you've put in the original email, greg?
Yes, that's exactly what we need here. Be sure to let me know what the
git id of the commit is in Linus's tree so we can properly track it (you
can put it in the changelog text somewhere.)
thanks,
greg k-h
This is the start of the stable review cycle for the 5.16.6 release.
There are 43 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Sun, 06 Feb 2022 09:19:05 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.16.6-rc1…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.16.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 5.16.6-rc1
Eric Dumazet <edumazet(a)google.com>
tcp: add missing tcp_skb_can_collapse() test in tcp_shift_skb_data()
Eric Dumazet <edumazet(a)google.com>
tcp: fix mem under-charging with zerocopy sendmsg()
Eric Dumazet <edumazet(a)google.com>
af_packet: fix data-race in packet_setsockopt / packet_setsockopt
Sasha Neftin <sasha.neftin(a)intel.com>
e1000e: Handshake with CSME starts from ADL platforms
Tianchen Ding <dtcccc(a)linux.alibaba.com>
cpuset: Fix the bug that subpart_cpus updated wrongly in update_cpumask()
He Fengqing <hefengqing(a)huawei.com>
bpf: Fix possible race in inc_misses_counter
Alex Elder <elder(a)linaro.org>
net: ipa: request IPA register values be retained
Eric Dumazet <edumazet(a)google.com>
rtnetlink: make sure to refresh master_dev/m_ops in __rtnl_newlink()
Eric Dumazet <edumazet(a)google.com>
net: sched: fix use-after-free in tc_new_tfilter()
Dan Carpenter <dan.carpenter(a)oracle.com>
fanotify: Fix stale file descriptor in copy_event_to_user()
Shyam Sundar S K <Shyam-sundar.S-k(a)amd.com>
net: amd-xgbe: Fix skb data length underflow
Raju Rangoju <Raju.Rangoju(a)amd.com>
net: amd-xgbe: ensure to reset the tx_timer_active flag
Karen Sornek <karen.sornek(a)intel.com>
i40e: Fix reset path while removing the driver
Jedrzej Jagielski <jedrzej.jagielski(a)intel.com>
i40e: Fix reset bw limit when DCB enabled with 1 TC
Georgi Valkov <gvalkov(a)abv.bg>
ipheth: fix EOVERFLOW in ipheth_rcvbulk_callback
Roi Dayan <roid(a)nvidia.com>
net/mlx5e: Avoid implicit modify hdr for decap drop rule
Maor Dickman <maord(a)nvidia.com>
net/mlx5: E-Switch, Fix uninitialized variable modact
Khalid Manaa <khalidm(a)nvidia.com>
net/mlx5e: Fix broken SKB allocation in HW-GRO
Khalid Manaa <khalidm(a)nvidia.com>
net/mlx5e: Fix wrong calculation of header index in HW_GRO
Kees Cook <keescook(a)chromium.org>
net/mlx5e: Avoid field-overflowing memcpy()
Roi Dayan <roid(a)nvidia.com>
net/mlx5: Bridge, Fix devlink deadlock on net namespace deletion
Maxim Mikityanskiy <maximmi(a)nvidia.com>
net/mlx5e: Don't treat small ceil values as unlimited in HTB offload
Dima Chumak <dchumak(a)nvidia.com>
net/mlx5: Fix offloading with ESWITCH_IPV4_TTL_MODIFY_ENABLE
Roi Dayan <roid(a)nvidia.com>
net/mlx5e: TC, Reject rules with forward and drop actions
Gal Pressman <gal(a)nvidia.com>
net/mlx5e: Fix module EEPROM query
Maher Sanalla <msanalla(a)nvidia.com>
net/mlx5: Use del_timer_sync in fw reset flow of halting poll
Maor Dickman <maord(a)nvidia.com>
net/mlx5e: Fix handling of wrong devices during bond netevent
Vlad Buslov <vladbu(a)nvidia.com>
net/mlx5: Bridge, ensure dev_name is null-terminated
Vlad Buslov <vladbu(a)nvidia.com>
net/mlx5: Bridge, take rtnl lock in init error handler
Roi Dayan <roid(a)nvidia.com>
net/mlx5e: TC, Reject rules with drop and modify hdr action
Raed Salem <raeds(a)nvidia.com>
net/mlx5e: IPsec: Fix tunnel mode crypto offload for non TCP/UDP traffic
Raed Salem <raeds(a)nvidia.com>
net/mlx5e: IPsec: Fix crypto offload for non TCP/UDP encapsulated traffic
J. Bruce Fields <bfields(a)redhat.com>
lockd: fix failure to cleanup client locks
J. Bruce Fields <bfields(a)redhat.com>
lockd: fix server crash on reboot of client holding lock
Miklos Szeredi <mszeredi(a)redhat.com>
ovl: don't fail copy up if no fileattr support on upper
Jonathan McDowell <noodles(a)earth.li>
net: phy: Fix qca8081 with speeds lower than 2.5Gb/s
John Hubbard <jhubbard(a)nvidia.com>
Revert "mm/gup: small refactoring: simplify try_grab_page()"
Eric W. Biederman <ebiederm(a)xmission.com>
cgroup-v1: Require capabilities to set release_agent
Maxime Ripard <maxime(a)cerno.tech>
drm/vc4: hdmi: Make sure the device is powered with CEC
Alex Elder <elder(a)linaro.org>
net: ipa: prevent concurrent replenish
Alex Elder <elder(a)linaro.org>
net: ipa: use a bitmap for endpoint replenish_enabled
Paolo Abeni <pabeni(a)redhat.com>
selftests: mptcp: fix ipv6 routing setup
Lukas Wunner <lukas(a)wunner.de>
PCI: pciehp: Fix infinite loop in IRQ handler upon power fault
-------------
Diffstat:
Makefile | 4 +-
drivers/gpu/drm/vc4/vc4_hdmi.c | 25 ++++++-----
drivers/net/ethernet/amd/xgbe/xgbe-drv.c | 14 +++++-
drivers/net/ethernet/intel/e1000e/netdev.c | 6 ++-
drivers/net/ethernet/intel/i40e/i40e.h | 1 +
drivers/net/ethernet/intel/i40e/i40e_main.c | 31 ++++++++++++-
drivers/net/ethernet/mellanox/mlx5/core/en.h | 6 +--
drivers/net/ethernet/mellanox/mlx5/core/en/qos.c | 3 +-
.../net/ethernet/mellanox/mlx5/core/en/rep/bond.c | 32 ++++++-------
.../ethernet/mellanox/mlx5/core/en/rep/bridge.c | 6 ++-
drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h | 5 +++
drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c | 4 +-
.../mellanox/mlx5/core/en_accel/ipsec_rxtx.c | 13 +++++-
.../mellanox/mlx5/core/en_accel/ipsec_rxtx.h | 9 ++--
drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 30 ++++++++-----
drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 15 ++++++-
.../net/ethernet/mellanox/mlx5/core/esw/bridge.c | 4 ++
.../mlx5/core/esw/diag/bridge_tracepoint.h | 2 +-
drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c | 2 +-
.../ethernet/mellanox/mlx5/core/lib/fs_chains.c | 9 ++--
drivers/net/ethernet/mellanox/mlx5/core/port.c | 9 ++--
drivers/net/ipa/ipa_endpoint.c | 21 +++++++--
drivers/net/ipa/ipa_endpoint.h | 17 ++++++-
drivers/net/ipa/ipa_power.c | 52 ++++++++++++++++++++++
drivers/net/ipa/ipa_power.h | 7 +++
drivers/net/ipa/ipa_uc.c | 5 +++
drivers/net/phy/at803x.c | 26 +++++------
drivers/net/usb/ipheth.c | 6 +--
drivers/pci/hotplug/pciehp_hpc.c | 7 +--
fs/lockd/svcsubs.c | 18 ++++----
fs/notify/fanotify/fanotify_user.c | 6 +--
fs/overlayfs/copy_up.c | 12 ++++-
kernel/bpf/trampoline.c | 5 ++-
kernel/cgroup/cgroup-v1.c | 14 ++++++
kernel/cgroup/cpuset.c | 3 +-
mm/gup.c | 35 ++++++++++++---
net/core/rtnetlink.c | 6 ++-
net/ipv4/tcp.c | 7 ++-
net/ipv4/tcp_input.c | 2 +
net/packet/af_packet.c | 8 +++-
net/sched/cls_api.c | 11 +++--
tools/testing/selftests/net/mptcp/mptcp_join.sh | 5 ++-
42 files changed, 374 insertions(+), 129 deletions(-)
Currently rcu_preempt_deferred_qs_irqrestore() releases rnp->boost_mtx
before reporting the expedited quiescent state. Under heavy real-time
load, this can result in this function being preempted before the
quiescent state is reported, which can in turn prevent the expedited grace
period from completing. Tim Murray reports that the resulting expedited
grace periods can take hundreds of milliseconds and even more than one
second, when they should normally complete in less than a millisecond.
This was fine given that there were no particular response-time
constraints for synchronize_rcu_expedited(), as it was designed
for throughput rather than latency. However, some users now need
sub-100-millisecond response-time constratints.
This patch therefore follows Neeraj's suggestion (seconded by Tim and
by Uladzislau Rezki) of simply reversing the two operations.
Reported-by: Tim Murray <timmurray(a)google.com>
Reported-by: Joel Fernandes <joelaf(a)google.com>
Reported-by: Neeraj Upadhyay <quic_neeraju(a)quicinc.com>
Reviewed-by: Neeraj Upadhyay <quic_neeraju(a)quicinc.com>
Reviewed-by: Uladzislau Rezki (Sony) <urezki(a)gmail.com>
Tested-by: Tim Murray <timmurray(a)google.com>
Cc: Todd Kjos <tkjos(a)google.com>
Cc: Sandeep Patil <sspatil(a)google.com>
Cc: <stable(a)vger.kernel.org> # 5.4.x
Signed-off-by: Paul E. McKenney <paulmck(a)kernel.org>
---
kernel/rcu/tree_plugin.h | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 109429e70a642..02ac057ba3f83 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -556,16 +556,16 @@ rcu_preempt_deferred_qs_irqrestore(struct task_struct *t, unsigned long flags)
raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
}
- /* Unboost if we were boosted. */
- if (IS_ENABLED(CONFIG_RCU_BOOST) && drop_boost_mutex)
- rt_mutex_futex_unlock(&rnp->boost_mtx.rtmutex);
-
/*
* If this was the last task on the expedited lists,
* then we need to report up the rcu_node hierarchy.
*/
if (!empty_exp && empty_exp_now)
rcu_report_exp_rnp(rnp, true);
+
+ /* Unboost if we were boosted. */
+ if (IS_ENABLED(CONFIG_RCU_BOOST) && drop_boost_mutex)
+ rt_mutex_futex_unlock(&rnp->boost_mtx.rtmutex);
} else {
local_irq_restore(flags);
}
--
2.31.1.189.g2e36527f23
When I rewrote the VMA dumping logic for coredumps, I changed it to
recognize ELF library mappings based on the file being executable instead
of the mapping having an ELF header. But turns out, distros ship many ELF
libraries as non-executable, so the heuristic goes wrong...
Restore the old behavior where FILTER(ELF_HEADERS) dumps the first page of
any offset-0 readable mapping that starts with the ELF magic.
This fix is technically layer-breaking a bit, because it checks for
something ELF-specific in fs/coredump.c; but since we probably want to
share this between standard ELF and FDPIC ELF anyway, I guess it's fine?
And this also keeps the change small for backporting.
Cc: stable(a)vger.kernel.org
Fixes: 429a22e776a2 ("coredump: rework elf/elf_fdpic vma_dump_size() into common helper")
Reported-by: Bill Messmer <wmessmer(a)microsoft.com>
Signed-off-by: Jann Horn <jannh(a)google.com>
---
@Bill: If you happen to have a kernel tree lying around, you could give
this a try and report back whether this solves your issues?
But if not, it's also fine, I've tested myself that with this patch
applied, the first 0x1000 bytes of non-executable libraries are dumped
into the coredump according to "readelf".
fs/coredump.c | 39 ++++++++++++++++++++++++++++++++++-----
1 file changed, 34 insertions(+), 5 deletions(-)
diff --git a/fs/coredump.c b/fs/coredump.c
index 1c060c0a2d72..b73817712dd2 100644
--- a/fs/coredump.c
+++ b/fs/coredump.c
@@ -42,6 +42,7 @@
#include <linux/path.h>
#include <linux/timekeeping.h>
#include <linux/sysctl.h>
+#include <linux/elf.h>
#include <linux/uaccess.h>
#include <asm/mmu_context.h>
@@ -980,6 +981,8 @@ static bool always_dump_vma(struct vm_area_struct *vma)
return false;
}
+#define DUMP_SIZE_MAYBE_ELFHDR_PLACEHOLDER 1
+
/*
* Decide how much of @vma's contents should be included in a core dump.
*/
@@ -1039,9 +1042,20 @@ static unsigned long vma_dump_size(struct vm_area_struct *vma,
* dump the first page to aid in determining what was mapped here.
*/
if (FILTER(ELF_HEADERS) &&
- vma->vm_pgoff == 0 && (vma->vm_flags & VM_READ) &&
- (READ_ONCE(file_inode(vma->vm_file)->i_mode) & 0111) != 0)
- return PAGE_SIZE;
+ vma->vm_pgoff == 0 && (vma->vm_flags & VM_READ)) {
+ if ((READ_ONCE(file_inode(vma->vm_file)->i_mode) & 0111) != 0)
+ return PAGE_SIZE;
+
+ /*
+ * ELF libraries aren't always executable.
+ * We'll want to check whether the mapping starts with the ELF
+ * magic, but not now - we're holding the mmap lock,
+ * so copy_from_user() doesn't work here.
+ * Use a placeholder instead, and fix it up later in
+ * dump_vma_snapshot().
+ */
+ return DUMP_SIZE_MAYBE_ELFHDR_PLACEHOLDER;
+ }
#undef FILTER
@@ -1116,8 +1130,6 @@ int dump_vma_snapshot(struct coredump_params *cprm, int *vma_count,
m->end = vma->vm_end;
m->flags = vma->vm_flags;
m->dump_size = vma_dump_size(vma, cprm->mm_flags);
-
- vma_data_size += m->dump_size;
}
mmap_write_unlock(mm);
@@ -1127,6 +1139,23 @@ int dump_vma_snapshot(struct coredump_params *cprm, int *vma_count,
return -EFAULT;
}
+ for (i = 0; i < *vma_count; i++) {
+ struct core_vma_metadata *m = (*vma_meta) + i;
+
+ if (m->dump_size == DUMP_SIZE_MAYBE_ELFHDR_PLACEHOLDER) {
+ char elfmag[SELFMAG];
+
+ if (copy_from_user(elfmag, (void __user *)m->start, SELFMAG) ||
+ memcmp(elfmag, ELFMAG, SELFMAG) != 0) {
+ m->dump_size = 0;
+ } else {
+ m->dump_size = PAGE_SIZE;
+ }
+ }
+
+ vma_data_size += m->dump_size;
+ }
+
*vma_data_size_ptr = vma_data_size;
return 0;
}
base-commit: 0280e3c58f92b2fe0e8fbbdf8d386449168de4a8
--
2.35.0.rc0.227.g00780c9af4-goog
This reverts commit 478ba09edc1f2f2ee27180a06150cb2d1a686f9c.
That commit was meant as a fix for setattrs with by fd (e.g. ftruncate)
to use an open fid instead of the first fid it found on lookup.
The proper fix for that is to use the fid associated with the open file
struct, available in iattr->ia_file for such operations, and was
actually done just before in 66246641609b ("9p: retrieve fid from file
when file instance exist.")
As such, this commit is no longer required.
Furthermore, changing lookup to return open fids first had unwanted side
effects, as it turns out the protocol forbids the use of open fids for
further walks (e.g. clone_fid) and we broke mounts for some servers
enforcing this rule.
Note this only reverts to the old working behaviour, but it's still
possible for lookup to return open fids if dentry->d_fsdata is not set,
so more work is needed to make sure we respect this rule in the future,
for example by adding a flag to the lookup functions to only match
certain fid open modes depending on caller requirements.
Fixes: 478ba09edc1f ("fs/9p: search open fids first")
Cc: stable(a)vger.kernel.org # v5.11+
Reported-by: ron minnich <rminnich(a)gmail.com>
Reported-by: ng(a)0x80.stream
Signed-off-by: Dominique Martinet <asmadeus(a)codewreck.org>
---
I'm sorry I didn't find time to check this properly fixes the clone
open fid issues, but Ron reported it did so I'll assume it did for now.
I'll try to find time to either implement the check in ganesha or use
another server -- if you have a suggestion that'd run either a ramfs or
export a local filesystem from linux I'm all ears, I couldn't get go9p
to work in the very little time I tried.
I did however check that Greg's original open/chmod 0/ftruncate pattern
works (while truncate was refused).
Also, before revert the truncate by path wasn't refused, and now is
again, so that's definitely good.
I've also tested open-unlink-ftruncate and it works properly with
ganesha, but not with qemu -- it looks like qemu tries to access the
file by path in setattr even if the fid has an associated fd, so that'd
be a qemu bug, but it's unrelated to this patch anyway.
Unless there are issues with this patch I'll send it to Linus around
Friday
fs/9p/fid.c | 9 ++++-----
1 file changed, 4 insertions(+), 5 deletions(-)
diff --git a/fs/9p/fid.c b/fs/9p/fid.c
index 6aab046c98e2..79df61fe0e59 100644
--- a/fs/9p/fid.c
+++ b/fs/9p/fid.c
@@ -96,12 +96,8 @@ static struct p9_fid *v9fs_fid_find(struct dentry *dentry, kuid_t uid, int any)
dentry, dentry, from_kuid(&init_user_ns, uid),
any);
ret = NULL;
-
- if (d_inode(dentry))
- ret = v9fs_fid_find_inode(d_inode(dentry), uid);
-
/* we'll recheck under lock if there's anything to look in */
- if (!ret && dentry->d_fsdata) {
+ if (dentry->d_fsdata) {
struct hlist_head *h = (struct hlist_head *)&dentry->d_fsdata;
spin_lock(&dentry->d_lock);
@@ -113,6 +109,9 @@ static struct p9_fid *v9fs_fid_find(struct dentry *dentry, kuid_t uid, int any)
}
}
spin_unlock(&dentry->d_lock);
+ } else {
+ if (dentry->d_inode)
+ ret = v9fs_fid_find_inode(dentry->d_inode, uid);
}
return ret;
--
2.34.1
From: Lang Yu <lang.yu(a)amd.com>
Subject: mm/kmemleak: avoid scanning potential huge holes
When using devm_request_free_mem_region() and devm_memremap_pages() to add
ZONE_DEVICE memory, if requested free mem region's end pfn were huge(e.g.,
0x400000000), the node_end_pfn() will be also huge (see
move_pfn_range_to_zone()). Thus it creates a huge hole between
node_start_pfn() and node_end_pfn().
We found on some AMD APUs, amdkfd requested such a free mem region and
created a huge hole. In such a case, following code snippet was just
doing busy test_bit() looping on the huge hole.
for (pfn = start_pfn; pfn < end_pfn; pfn++) {
struct page *page = pfn_to_online_page(pfn);
if (!page)
continue;
...
}
So we got a soft lockup:
watchdog: BUG: soft lockup - CPU#6 stuck for 26s! [bash:1221]
CPU: 6 PID: 1221 Comm: bash Not tainted 5.15.0-custom #1
RIP: 0010:pfn_to_online_page+0x5/0xd0
Call Trace:
? kmemleak_scan+0x16a/0x440
kmemleak_write+0x306/0x3a0
? common_file_perm+0x72/0x170
full_proxy_write+0x5c/0x90
vfs_write+0xb9/0x260
ksys_write+0x67/0xe0
__x64_sys_write+0x1a/0x20
do_syscall_64+0x3b/0xc0
entry_SYSCALL_64_after_hwframe+0x44/0xae
I did some tests with the patch.
(1) amdgpu module unloaded
before the patch:
real 0m0.976s
user 0m0.000s
sys 0m0.968s
after the patch:
real 0m0.981s
user 0m0.000s
sys 0m0.973s
(2) amdgpu module loaded
before the patch:
real 0m35.365s
user 0m0.000s
sys 0m35.354s
after the patch:
real 0m1.049s
user 0m0.000s
sys 0m1.042s
Link: https://lkml.kernel.org/r/20211108140029.721144-1-lang.yu@amd.com
Signed-off-by: Lang Yu <lang.yu(a)amd.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Acked-by: Catalin Marinas <catalin.marinas(a)arm.com>
Cc: Oscar Salvador <osalvador(a)suse.de>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/kmemleak.c | 13 +++++++------
1 file changed, 7 insertions(+), 6 deletions(-)
--- a/mm/kmemleak.c~mm-kmemleak-avoid-scanning-potential-huge-holes
+++ a/mm/kmemleak.c
@@ -1410,7 +1410,8 @@ static void kmemleak_scan(void)
{
unsigned long flags;
struct kmemleak_object *object;
- int i;
+ struct zone *zone;
+ int __maybe_unused i;
int new_leaks = 0;
jiffies_last_scan = jiffies;
@@ -1450,9 +1451,9 @@ static void kmemleak_scan(void)
* Struct page scanning for each node.
*/
get_online_mems();
- for_each_online_node(i) {
- unsigned long start_pfn = node_start_pfn(i);
- unsigned long end_pfn = node_end_pfn(i);
+ for_each_populated_zone(zone) {
+ unsigned long start_pfn = zone->zone_start_pfn;
+ unsigned long end_pfn = zone_end_pfn(zone);
unsigned long pfn;
for (pfn = start_pfn; pfn < end_pfn; pfn++) {
@@ -1461,8 +1462,8 @@ static void kmemleak_scan(void)
if (!page)
continue;
- /* only scan pages belonging to this node */
- if (page_to_nid(page) != i)
+ /* only scan pages belonging to this zone */
+ if (page_zone(page) != zone)
continue;
/* only scan if page is in use */
if (page_count(page) == 0)
_
From: Mike Rapoport <rppt(a)linux.ibm.com>
Subject: mm/pgtable: define pte_index so that preprocessor could recognize it
Since commit 974b9b2c68f3 ("mm: consolidate pte_index() and pte_offset_*()
definitions") pte_index is a static inline and there is no define for it
that can be recognized by the preprocessor. As a result,
vm_insert_pages() uses slower loop over vm_insert_page() instead of
insert_pages() that amortizes the cost of spinlock operations when
inserting multiple pages.
Link: https://lkml.kernel.org/r/20220111145457.20748-1-rppt@kernel.org
Fixes: 974b9b2c68f3 ("mm: consolidate pte_index() and pte_offset_*() definitions")
Signed-off-by: Mike Rapoport <rppt(a)linux.ibm.com>
Reported-by: Christian Dietrich <stettberger(a)dokucode.de>
Reviewed-by: Khalid Aziz <khalid.aziz(a)oracle.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/pgtable.h | 1 +
1 file changed, 1 insertion(+)
--- a/include/linux/pgtable.h~mm-pgtable-define-pte_index-so-that-preprocessor-could-recognize-it
+++ a/include/linux/pgtable.h
@@ -62,6 +62,7 @@ static inline unsigned long pte_index(un
{
return (address >> PAGE_SHIFT) & (PTRS_PER_PTE - 1);
}
+#define pte_index pte_index
#ifndef pmd_index
static inline unsigned long pmd_index(unsigned long address)
_
From: Pasha Tatashin <pasha.tatashin(a)soleen.com>
Subject: mm/debug_vm_pgtable: remove pte entry from the page table
Patch series "page table check fixes and cleanups", v5.
This patch (of 4):
The pte entry that is used in pte_advanced_tests() is never removed from
the page table at the end of the test.
The issue is detected by page_table_check, to repro compile kernel with
the following configs:
CONFIG_DEBUG_VM_PGTABLE=y
CONFIG_PAGE_TABLE_CHECK=y
CONFIG_PAGE_TABLE_CHECK_ENFORCED=y
During the boot the following BUG is printed:
[ 2.262821] debug_vm_pgtable: [debug_vm_pgtable ]: Validating
architecture page table helpers
[ 2.276826] ------------[ cut here ]------------
[ 2.280426] kernel BUG at mm/page_table_check.c:162!
[ 2.284118] invalid opcode: 0000 [#1] PREEMPT SMP PTI
[ 2.287787] CPU: 0 PID: 1 Comm: swapper/0 Not tainted
5.16.0-11413-g2c271fe77d52 #3
[ 2.293226] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org
04/01/2014
...
The entry should be properly removed from the page table before the page
is released to the free list.
Link: https://lkml.kernel.org/r/20220131203249.2832273-1-pasha.tatashin@soleen.com
Link: https://lkml.kernel.org/r/20220131203249.2832273-2-pasha.tatashin@soleen.com
Fixes: a5c3b9ffb0f4 ("mm/debug_vm_pgtable: add tests validating advanced arch page table helpers")
Signed-off-by: Pasha Tatashin <pasha.tatashin(a)soleen.com>
Reviewed-by: Zi Yan <ziy(a)nvidia.com>
Tested-by: Zi Yan <ziy(a)nvidia.com>
Acked-by: David Rientjes <rientjes(a)google.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual(a)arm.com>
Cc: Paul Turner <pjt(a)google.com>
Cc: Wei Xu <weixugc(a)google.com>
Cc: Greg Thelen <gthelen(a)google.com>
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Will Deacon <will(a)kernel.org>
Cc: Mike Rapoport <rppt(a)kernel.org>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: H. Peter Anvin <hpa(a)zytor.com>
Cc: Aneesh Kumar K.V <aneesh.kumar(a)linux.ibm.com>
Cc: Jiri Slaby <jirislaby(a)kernel.org>
Cc: Muchun Song <songmuchun(a)bytedance.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: <stable(a)vger.kernel.org> [5.9+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/debug_vm_pgtable.c | 2 ++
1 file changed, 2 insertions(+)
--- a/mm/debug_vm_pgtable.c~mm-debug_vm_pgtable-remove-pte-entry-from-the-page-table
+++ a/mm/debug_vm_pgtable.c
@@ -171,6 +171,8 @@ static void __init pte_advanced_tests(st
ptep_test_and_clear_young(args->vma, args->vaddr, args->ptep);
pte = ptep_get(args->ptep);
WARN_ON(pte_young(pte));
+
+ ptep_get_and_clear_full(args->mm, args->vaddr, args->ptep, 1);
}
static void __init pte_savedwrite_tests(struct pgtable_debug_args *args)
_
hallo Greg
5.16.6-rc1
compiles, boots and runs on my x86_64
(Intel i5-11400, Fedora 35)
Thanks
Tested-by: Ronald Warsow <rwarsow(a)gmx.de>
regards
Ronald
This is a note to let you know that I've just added the patch titled
vt_ioctl: add array_index_nospec to VT_ACTIVATE
to my tty git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty.git
in the tty-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
From 28cb138f559f8c1a1395f5564f86b8bbee83631b Mon Sep 17 00:00:00 2001
From: Jakob Koschel <jakobkoschel(a)gmail.com>
Date: Thu, 27 Jan 2022 15:44:05 +0100
Subject: vt_ioctl: add array_index_nospec to VT_ACTIVATE
in vt_setactivate an almost identical code path has been patched
with array_index_nospec. In the VT_ACTIVATE path the user input
is from a system call argument instead of a usercopy.
For consistency both code paths should have the same mitigations
applied.
Kasper Acknowledgements: Jakob Koschel, Brian Johannesmeyer, Kaveh
Razavi, Herbert Bos, Cristiano Giuffrida from the VUSec group at VU
Amsterdam.
Co-developed-by: Brian Johannesmeyer <bjohannesmeyer(a)gmail.com>
Signed-off-by: Brian Johannesmeyer <bjohannesmeyer(a)gmail.com>
Signed-off-by: Jakob Koschel <jakobkoschel(a)gmail.com>
Link: https://lore.kernel.org/r/20220127144406.3589293-2-jakobkoschel@gmail.com
Cc: stable <stable(a)vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/tty/vt/vt_ioctl.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/tty/vt/vt_ioctl.c b/drivers/tty/vt/vt_ioctl.c
index e0714a9c9fd7..58013698635f 100644
--- a/drivers/tty/vt/vt_ioctl.c
+++ b/drivers/tty/vt/vt_ioctl.c
@@ -845,6 +845,7 @@ int vt_ioctl(struct tty_struct *tty,
return -ENXIO;
arg--;
+ arg = array_index_nospec(arg, MAX_NR_CONSOLES);
console_lock();
ret = vc_allocate(arg);
console_unlock();
--
2.35.1
This is a note to let you know that I've just added the patch titled
vt_ioctl: fix array_index_nospec in vt_setactivate
to my tty git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty.git
in the tty-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
From 61cc70d9e8ef5b042d4ed87994d20100ec8896d9 Mon Sep 17 00:00:00 2001
From: Jakob Koschel <jakobkoschel(a)gmail.com>
Date: Thu, 27 Jan 2022 15:44:04 +0100
Subject: vt_ioctl: fix array_index_nospec in vt_setactivate
array_index_nospec ensures that an out-of-bounds value is set to zero
on the transient path. Decreasing the value by one afterwards causes
a transient integer underflow. vsa.console should be decreased first
and then sanitized with array_index_nospec.
Kasper Acknowledgements: Jakob Koschel, Brian Johannesmeyer, Kaveh
Razavi, Herbert Bos, Cristiano Giuffrida from the VUSec group at VU
Amsterdam.
Co-developed-by: Brian Johannesmeyer <bjohannesmeyer(a)gmail.com>
Signed-off-by: Brian Johannesmeyer <bjohannesmeyer(a)gmail.com>
Signed-off-by: Jakob Koschel <jakobkoschel(a)gmail.com>
Link: https://lore.kernel.org/r/20220127144406.3589293-1-jakobkoschel@gmail.com
Cc: stable <stable(a)vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/tty/vt/vt_ioctl.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/tty/vt/vt_ioctl.c b/drivers/tty/vt/vt_ioctl.c
index 3639bb6dc372..e0714a9c9fd7 100644
--- a/drivers/tty/vt/vt_ioctl.c
+++ b/drivers/tty/vt/vt_ioctl.c
@@ -599,8 +599,8 @@ static int vt_setactivate(struct vt_setactivate __user *sa)
if (vsa.console == 0 || vsa.console > MAX_NR_CONSOLES)
return -ENXIO;
- vsa.console = array_index_nospec(vsa.console, MAX_NR_CONSOLES + 1);
vsa.console--;
+ vsa.console = array_index_nospec(vsa.console, MAX_NR_CONSOLES);
console_lock();
ret = vc_allocate(vsa.console);
if (ret) {
--
2.35.1
commit 8e9eacad7ec7a9cbf262649ebf1fa6e6f6cc7d82 upstream.
The upstream commit had to handle a lookup_by_id variable that is only
present in 5.17. This version of the patch removes that variable, so the
__lookup_addr() function only handles the lookup as it is implemented in
5.15 and 5.16. It also removes one 'const' keyword to prevent a warning
due to differing const-ness in the 5.17 version of addresses_equal().
The MPTCP endpoint list is under RCU protection, guarded by the
pernet spinlock. mptcp_nl_cmd_set_flags() traverses the list
without acquiring the spin-lock nor under the RCU critical section.
This change addresses the issue performing the lookup and the endpoint
update under the pernet spinlock.
Fixes: 0f9f696a502e ("mptcp: add set_flags command in PM netlink")
Signed-off-by: Mat Martineau <mathew.j.martineau(a)linux.intel.com>
---
Paolo, can I add an ack or signoff tag from you?
The upstream commit (8e9eacad7ec7) was queued for the 5.16 and 5.15
stable trees, which brought along a few extra patches that didn't belong
in stable. I asked Greg to drop those patches from his queue, and this
particular commit required manual changes as described above (related to
the lookup_by_id variable that's new in 5.16).
This patch will not apply to the export branch. I confirmed that it
applies, builds, and runs on both the 5.16.5 and 5.15.19 branches. Self
tests succeed too.
When I send to the stable list, I'll also include these tags:
Cc: <stable(a)vger.kernel.org> # 5.15.x
Cc: <stable(a)vger.kernel.org> # 5.16.x
---
net/mptcp/pm_netlink.c | 34 +++++++++++++++++++++++++---------
1 file changed, 25 insertions(+), 9 deletions(-)
diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c
index 65764c8171b3..5d305fafd0e9 100644
--- a/net/mptcp/pm_netlink.c
+++ b/net/mptcp/pm_netlink.c
@@ -459,6 +459,18 @@ static unsigned int fill_remote_addresses_vec(struct mptcp_sock *msk, bool fullm
return i;
}
+static struct mptcp_pm_addr_entry *
+__lookup_addr(struct pm_nl_pernet *pernet, struct mptcp_addr_info *info)
+{
+ struct mptcp_pm_addr_entry *entry;
+
+ list_for_each_entry(entry, &pernet->local_addr_list, list) {
+ if (addresses_equal(&entry->addr, info, true))
+ return entry;
+ }
+ return NULL;
+}
+
static void mptcp_pm_create_subflow_or_signal_addr(struct mptcp_sock *msk)
{
struct sock *sk = (struct sock *)msk;
@@ -1725,17 +1737,21 @@ static int mptcp_nl_cmd_set_flags(struct sk_buff *skb, struct genl_info *info)
if (addr.flags & MPTCP_PM_ADDR_FLAG_BACKUP)
bkup = 1;
- list_for_each_entry(entry, &pernet->local_addr_list, list) {
- if (addresses_equal(&entry->addr, &addr.addr, true)) {
- mptcp_nl_addr_backup(net, &entry->addr, bkup);
-
- if (bkup)
- entry->flags |= MPTCP_PM_ADDR_FLAG_BACKUP;
- else
- entry->flags &= ~MPTCP_PM_ADDR_FLAG_BACKUP;
- }
+ spin_lock_bh(&pernet->lock);
+ entry = __lookup_addr(pernet, &addr.addr);
+ if (!entry) {
+ spin_unlock_bh(&pernet->lock);
+ return -EINVAL;
}
+ if (bkup)
+ entry->flags |= MPTCP_PM_ADDR_FLAG_BACKUP;
+ else
+ entry->flags &= ~MPTCP_PM_ADDR_FLAG_BACKUP;
+ addr = *entry;
+ spin_unlock_bh(&pernet->lock);
+
+ mptcp_nl_addr_backup(net, &addr.addr, bkup);
return 0;
}
--
2.35.1
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 0e3135d3bfa5dfb658145238d2bc723a8e30c3a3 Mon Sep 17 00:00:00 2001
From: He Fengqing <hefengqing(a)huawei.com>
Date: Sat, 22 Jan 2022 10:29:36 +0000
Subject: [PATCH] bpf: Fix possible race in inc_misses_counter
It seems inc_misses_counter() suffers from same issue fixed in
the commit d979617aa84d ("bpf: Fixes possible race in update_prog_stats()
for 32bit arches"):
As it can run while interrupts are enabled, it could
be re-entered and the u64_stats syncp could be mangled.
Fixes: 9ed9e9ba2337 ("bpf: Count the number of times recursion was prevented")
Signed-off-by: He Fengqing <hefengqing(a)huawei.com>
Acked-by: John Fastabend <john.fastabend(a)gmail.com>
Link: https://lore.kernel.org/r/20220122102936.1219518-1-hefengqing@huawei.com
Signed-off-by: Alexei Starovoitov <ast(a)kernel.org>
diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c
index 4b6974a195c1..5e7edf913060 100644
--- a/kernel/bpf/trampoline.c
+++ b/kernel/bpf/trampoline.c
@@ -550,11 +550,12 @@ static __always_inline u64 notrace bpf_prog_start_time(void)
static void notrace inc_misses_counter(struct bpf_prog *prog)
{
struct bpf_prog_stats *stats;
+ unsigned int flags;
stats = this_cpu_ptr(prog->stats);
- u64_stats_update_begin(&stats->syncp);
+ flags = u64_stats_update_begin_irqsave(&stats->syncp);
u64_stats_inc(&stats->misses);
- u64_stats_update_end(&stats->syncp);
+ u64_stats_update_end_irqrestore(&stats->syncp, flags);
}
/* The logic is similar to bpf_prog_run(), but with an explicit
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 6533e558c6505e94c3e0ed4281ed5e31ec985f4d Mon Sep 17 00:00:00 2001
From: Karen Sornek <karen.sornek(a)intel.com>
Date: Wed, 12 Jan 2022 10:19:47 +0100
Subject: [PATCH] i40e: Fix reset path while removing the driver
Fix the crash in kernel while dereferencing the NULL pointer,
when the driver is unloaded and simultaneously the VSI rings
are being stopped.
The hardware requires 50msec in order to finish RX queues
disable. For this purpose the driver spins in mdelay function
for the operation to be completed.
For example changing number of queues which requires reset would
fail in the following call stack:
1) i40e_prep_for_reset
2) i40e_pf_quiesce_all_vsi
3) i40e_quiesce_vsi
4) i40e_vsi_close
5) i40e_down
6) i40e_vsi_stop_rings
7) i40e_vsi_control_rx -> disable requires the delay of 50msecs
8) continue back in i40e_down function where
i40e_clean_tx_ring(vsi->tx_rings[i]) is going to crash
When the driver was spinning vsi_release called
i40e_vsi_free_arrays where the vsi->tx_rings resources
were freed and the pointer was set to NULL.
Fixes: 5b6d4a7f20b0 ("i40e: Fix crash during removing i40e driver")
Signed-off-by: Slawomir Laba <slawomirx.laba(a)intel.com>
Signed-off-by: Sylwester Dziedziuch <sylwesterx.dziedziuch(a)intel.com>
Signed-off-by: Karen Sornek <karen.sornek(a)intel.com>
Tested-by: Gurucharan G <gurucharanx.g(a)intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen(a)intel.com>
diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h
index 2e02cc68cd3f..80c5cecaf2b5 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -144,6 +144,7 @@ enum i40e_state_t {
__I40E_VIRTCHNL_OP_PENDING,
__I40E_RECOVERY_MODE,
__I40E_VF_RESETS_DISABLED, /* disable resets during i40e_remove */
+ __I40E_IN_REMOVE,
__I40E_VFS_RELEASING,
/* This must be last as it determines the size of the BITMAP */
__I40E_STATE_SIZE__,
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 5cb4dc69fe87..0c4b7dfb3b35 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -10863,6 +10863,9 @@ static void i40e_reset_and_rebuild(struct i40e_pf *pf, bool reinit,
bool lock_acquired)
{
int ret;
+
+ if (test_bit(__I40E_IN_REMOVE, pf->state))
+ return;
/* Now we wait for GRST to settle out.
* We don't have to delete the VEBs or VSIs from the hw switch
* because the reset will make them disappear.
@@ -12222,6 +12225,8 @@ int i40e_reconfig_rss_queues(struct i40e_pf *pf, int queue_count)
vsi->req_queue_pairs = queue_count;
i40e_prep_for_reset(pf);
+ if (test_bit(__I40E_IN_REMOVE, pf->state))
+ return pf->alloc_rss_size;
pf->alloc_rss_size = new_rss_size;
@@ -13048,6 +13053,10 @@ static int i40e_xdp_setup(struct i40e_vsi *vsi, struct bpf_prog *prog,
if (need_reset)
i40e_prep_for_reset(pf);
+ /* VSI shall be deleted in a moment, just return EINVAL */
+ if (test_bit(__I40E_IN_REMOVE, pf->state))
+ return -EINVAL;
+
old_prog = xchg(&vsi->xdp_prog, prog);
if (need_reset) {
@@ -15938,8 +15947,13 @@ static void i40e_remove(struct pci_dev *pdev)
i40e_write_rx_ctl(hw, I40E_PFQF_HENA(0), 0);
i40e_write_rx_ctl(hw, I40E_PFQF_HENA(1), 0);
- while (test_bit(__I40E_RESET_RECOVERY_PENDING, pf->state))
+ /* Grab __I40E_RESET_RECOVERY_PENDING and set __I40E_IN_REMOVE
+ * flags, once they are set, i40e_rebuild should not be called as
+ * i40e_prep_for_reset always returns early.
+ */
+ while (test_and_set_bit(__I40E_RESET_RECOVERY_PENDING, pf->state))
usleep_range(1000, 2000);
+ set_bit(__I40E_IN_REMOVE, pf->state);
if (pf->flags & I40E_FLAG_SRIOV_ENABLED) {
set_bit(__I40E_VF_RESETS_DISABLED, pf->state);
@@ -16138,6 +16152,9 @@ static void i40e_pci_error_reset_done(struct pci_dev *pdev)
{
struct i40e_pf *pf = pci_get_drvdata(pdev);
+ if (test_bit(__I40E_IN_REMOVE, pf->state))
+ return;
+
i40e_reset_and_rebuild(pf, false, false);
}
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 6533e558c6505e94c3e0ed4281ed5e31ec985f4d Mon Sep 17 00:00:00 2001
From: Karen Sornek <karen.sornek(a)intel.com>
Date: Wed, 12 Jan 2022 10:19:47 +0100
Subject: [PATCH] i40e: Fix reset path while removing the driver
Fix the crash in kernel while dereferencing the NULL pointer,
when the driver is unloaded and simultaneously the VSI rings
are being stopped.
The hardware requires 50msec in order to finish RX queues
disable. For this purpose the driver spins in mdelay function
for the operation to be completed.
For example changing number of queues which requires reset would
fail in the following call stack:
1) i40e_prep_for_reset
2) i40e_pf_quiesce_all_vsi
3) i40e_quiesce_vsi
4) i40e_vsi_close
5) i40e_down
6) i40e_vsi_stop_rings
7) i40e_vsi_control_rx -> disable requires the delay of 50msecs
8) continue back in i40e_down function where
i40e_clean_tx_ring(vsi->tx_rings[i]) is going to crash
When the driver was spinning vsi_release called
i40e_vsi_free_arrays where the vsi->tx_rings resources
were freed and the pointer was set to NULL.
Fixes: 5b6d4a7f20b0 ("i40e: Fix crash during removing i40e driver")
Signed-off-by: Slawomir Laba <slawomirx.laba(a)intel.com>
Signed-off-by: Sylwester Dziedziuch <sylwesterx.dziedziuch(a)intel.com>
Signed-off-by: Karen Sornek <karen.sornek(a)intel.com>
Tested-by: Gurucharan G <gurucharanx.g(a)intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen(a)intel.com>
diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h
index 2e02cc68cd3f..80c5cecaf2b5 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -144,6 +144,7 @@ enum i40e_state_t {
__I40E_VIRTCHNL_OP_PENDING,
__I40E_RECOVERY_MODE,
__I40E_VF_RESETS_DISABLED, /* disable resets during i40e_remove */
+ __I40E_IN_REMOVE,
__I40E_VFS_RELEASING,
/* This must be last as it determines the size of the BITMAP */
__I40E_STATE_SIZE__,
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 5cb4dc69fe87..0c4b7dfb3b35 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -10863,6 +10863,9 @@ static void i40e_reset_and_rebuild(struct i40e_pf *pf, bool reinit,
bool lock_acquired)
{
int ret;
+
+ if (test_bit(__I40E_IN_REMOVE, pf->state))
+ return;
/* Now we wait for GRST to settle out.
* We don't have to delete the VEBs or VSIs from the hw switch
* because the reset will make them disappear.
@@ -12222,6 +12225,8 @@ int i40e_reconfig_rss_queues(struct i40e_pf *pf, int queue_count)
vsi->req_queue_pairs = queue_count;
i40e_prep_for_reset(pf);
+ if (test_bit(__I40E_IN_REMOVE, pf->state))
+ return pf->alloc_rss_size;
pf->alloc_rss_size = new_rss_size;
@@ -13048,6 +13053,10 @@ static int i40e_xdp_setup(struct i40e_vsi *vsi, struct bpf_prog *prog,
if (need_reset)
i40e_prep_for_reset(pf);
+ /* VSI shall be deleted in a moment, just return EINVAL */
+ if (test_bit(__I40E_IN_REMOVE, pf->state))
+ return -EINVAL;
+
old_prog = xchg(&vsi->xdp_prog, prog);
if (need_reset) {
@@ -15938,8 +15947,13 @@ static void i40e_remove(struct pci_dev *pdev)
i40e_write_rx_ctl(hw, I40E_PFQF_HENA(0), 0);
i40e_write_rx_ctl(hw, I40E_PFQF_HENA(1), 0);
- while (test_bit(__I40E_RESET_RECOVERY_PENDING, pf->state))
+ /* Grab __I40E_RESET_RECOVERY_PENDING and set __I40E_IN_REMOVE
+ * flags, once they are set, i40e_rebuild should not be called as
+ * i40e_prep_for_reset always returns early.
+ */
+ while (test_and_set_bit(__I40E_RESET_RECOVERY_PENDING, pf->state))
usleep_range(1000, 2000);
+ set_bit(__I40E_IN_REMOVE, pf->state);
if (pf->flags & I40E_FLAG_SRIOV_ENABLED) {
set_bit(__I40E_VF_RESETS_DISABLED, pf->state);
@@ -16138,6 +16152,9 @@ static void i40e_pci_error_reset_done(struct pci_dev *pdev)
{
struct i40e_pf *pf = pci_get_drvdata(pdev);
+ if (test_bit(__I40E_IN_REMOVE, pf->state))
+ return;
+
i40e_reset_and_rebuild(pf, false, false);
}
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 6533e558c6505e94c3e0ed4281ed5e31ec985f4d Mon Sep 17 00:00:00 2001
From: Karen Sornek <karen.sornek(a)intel.com>
Date: Wed, 12 Jan 2022 10:19:47 +0100
Subject: [PATCH] i40e: Fix reset path while removing the driver
Fix the crash in kernel while dereferencing the NULL pointer,
when the driver is unloaded and simultaneously the VSI rings
are being stopped.
The hardware requires 50msec in order to finish RX queues
disable. For this purpose the driver spins in mdelay function
for the operation to be completed.
For example changing number of queues which requires reset would
fail in the following call stack:
1) i40e_prep_for_reset
2) i40e_pf_quiesce_all_vsi
3) i40e_quiesce_vsi
4) i40e_vsi_close
5) i40e_down
6) i40e_vsi_stop_rings
7) i40e_vsi_control_rx -> disable requires the delay of 50msecs
8) continue back in i40e_down function where
i40e_clean_tx_ring(vsi->tx_rings[i]) is going to crash
When the driver was spinning vsi_release called
i40e_vsi_free_arrays where the vsi->tx_rings resources
were freed and the pointer was set to NULL.
Fixes: 5b6d4a7f20b0 ("i40e: Fix crash during removing i40e driver")
Signed-off-by: Slawomir Laba <slawomirx.laba(a)intel.com>
Signed-off-by: Sylwester Dziedziuch <sylwesterx.dziedziuch(a)intel.com>
Signed-off-by: Karen Sornek <karen.sornek(a)intel.com>
Tested-by: Gurucharan G <gurucharanx.g(a)intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen(a)intel.com>
diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h
index 2e02cc68cd3f..80c5cecaf2b5 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -144,6 +144,7 @@ enum i40e_state_t {
__I40E_VIRTCHNL_OP_PENDING,
__I40E_RECOVERY_MODE,
__I40E_VF_RESETS_DISABLED, /* disable resets during i40e_remove */
+ __I40E_IN_REMOVE,
__I40E_VFS_RELEASING,
/* This must be last as it determines the size of the BITMAP */
__I40E_STATE_SIZE__,
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 5cb4dc69fe87..0c4b7dfb3b35 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -10863,6 +10863,9 @@ static void i40e_reset_and_rebuild(struct i40e_pf *pf, bool reinit,
bool lock_acquired)
{
int ret;
+
+ if (test_bit(__I40E_IN_REMOVE, pf->state))
+ return;
/* Now we wait for GRST to settle out.
* We don't have to delete the VEBs or VSIs from the hw switch
* because the reset will make them disappear.
@@ -12222,6 +12225,8 @@ int i40e_reconfig_rss_queues(struct i40e_pf *pf, int queue_count)
vsi->req_queue_pairs = queue_count;
i40e_prep_for_reset(pf);
+ if (test_bit(__I40E_IN_REMOVE, pf->state))
+ return pf->alloc_rss_size;
pf->alloc_rss_size = new_rss_size;
@@ -13048,6 +13053,10 @@ static int i40e_xdp_setup(struct i40e_vsi *vsi, struct bpf_prog *prog,
if (need_reset)
i40e_prep_for_reset(pf);
+ /* VSI shall be deleted in a moment, just return EINVAL */
+ if (test_bit(__I40E_IN_REMOVE, pf->state))
+ return -EINVAL;
+
old_prog = xchg(&vsi->xdp_prog, prog);
if (need_reset) {
@@ -15938,8 +15947,13 @@ static void i40e_remove(struct pci_dev *pdev)
i40e_write_rx_ctl(hw, I40E_PFQF_HENA(0), 0);
i40e_write_rx_ctl(hw, I40E_PFQF_HENA(1), 0);
- while (test_bit(__I40E_RESET_RECOVERY_PENDING, pf->state))
+ /* Grab __I40E_RESET_RECOVERY_PENDING and set __I40E_IN_REMOVE
+ * flags, once they are set, i40e_rebuild should not be called as
+ * i40e_prep_for_reset always returns early.
+ */
+ while (test_and_set_bit(__I40E_RESET_RECOVERY_PENDING, pf->state))
usleep_range(1000, 2000);
+ set_bit(__I40E_IN_REMOVE, pf->state);
if (pf->flags & I40E_FLAG_SRIOV_ENABLED) {
set_bit(__I40E_VF_RESETS_DISABLED, pf->state);
@@ -16138,6 +16152,9 @@ static void i40e_pci_error_reset_done(struct pci_dev *pdev)
{
struct i40e_pf *pf = pci_get_drvdata(pdev);
+ if (test_bit(__I40E_IN_REMOVE, pf->state))
+ return;
+
i40e_reset_and_rebuild(pf, false, false);
}
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 6533e558c6505e94c3e0ed4281ed5e31ec985f4d Mon Sep 17 00:00:00 2001
From: Karen Sornek <karen.sornek(a)intel.com>
Date: Wed, 12 Jan 2022 10:19:47 +0100
Subject: [PATCH] i40e: Fix reset path while removing the driver
Fix the crash in kernel while dereferencing the NULL pointer,
when the driver is unloaded and simultaneously the VSI rings
are being stopped.
The hardware requires 50msec in order to finish RX queues
disable. For this purpose the driver spins in mdelay function
for the operation to be completed.
For example changing number of queues which requires reset would
fail in the following call stack:
1) i40e_prep_for_reset
2) i40e_pf_quiesce_all_vsi
3) i40e_quiesce_vsi
4) i40e_vsi_close
5) i40e_down
6) i40e_vsi_stop_rings
7) i40e_vsi_control_rx -> disable requires the delay of 50msecs
8) continue back in i40e_down function where
i40e_clean_tx_ring(vsi->tx_rings[i]) is going to crash
When the driver was spinning vsi_release called
i40e_vsi_free_arrays where the vsi->tx_rings resources
were freed and the pointer was set to NULL.
Fixes: 5b6d4a7f20b0 ("i40e: Fix crash during removing i40e driver")
Signed-off-by: Slawomir Laba <slawomirx.laba(a)intel.com>
Signed-off-by: Sylwester Dziedziuch <sylwesterx.dziedziuch(a)intel.com>
Signed-off-by: Karen Sornek <karen.sornek(a)intel.com>
Tested-by: Gurucharan G <gurucharanx.g(a)intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen(a)intel.com>
diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h
index 2e02cc68cd3f..80c5cecaf2b5 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -144,6 +144,7 @@ enum i40e_state_t {
__I40E_VIRTCHNL_OP_PENDING,
__I40E_RECOVERY_MODE,
__I40E_VF_RESETS_DISABLED, /* disable resets during i40e_remove */
+ __I40E_IN_REMOVE,
__I40E_VFS_RELEASING,
/* This must be last as it determines the size of the BITMAP */
__I40E_STATE_SIZE__,
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 5cb4dc69fe87..0c4b7dfb3b35 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -10863,6 +10863,9 @@ static void i40e_reset_and_rebuild(struct i40e_pf *pf, bool reinit,
bool lock_acquired)
{
int ret;
+
+ if (test_bit(__I40E_IN_REMOVE, pf->state))
+ return;
/* Now we wait for GRST to settle out.
* We don't have to delete the VEBs or VSIs from the hw switch
* because the reset will make them disappear.
@@ -12222,6 +12225,8 @@ int i40e_reconfig_rss_queues(struct i40e_pf *pf, int queue_count)
vsi->req_queue_pairs = queue_count;
i40e_prep_for_reset(pf);
+ if (test_bit(__I40E_IN_REMOVE, pf->state))
+ return pf->alloc_rss_size;
pf->alloc_rss_size = new_rss_size;
@@ -13048,6 +13053,10 @@ static int i40e_xdp_setup(struct i40e_vsi *vsi, struct bpf_prog *prog,
if (need_reset)
i40e_prep_for_reset(pf);
+ /* VSI shall be deleted in a moment, just return EINVAL */
+ if (test_bit(__I40E_IN_REMOVE, pf->state))
+ return -EINVAL;
+
old_prog = xchg(&vsi->xdp_prog, prog);
if (need_reset) {
@@ -15938,8 +15947,13 @@ static void i40e_remove(struct pci_dev *pdev)
i40e_write_rx_ctl(hw, I40E_PFQF_HENA(0), 0);
i40e_write_rx_ctl(hw, I40E_PFQF_HENA(1), 0);
- while (test_bit(__I40E_RESET_RECOVERY_PENDING, pf->state))
+ /* Grab __I40E_RESET_RECOVERY_PENDING and set __I40E_IN_REMOVE
+ * flags, once they are set, i40e_rebuild should not be called as
+ * i40e_prep_for_reset always returns early.
+ */
+ while (test_and_set_bit(__I40E_RESET_RECOVERY_PENDING, pf->state))
usleep_range(1000, 2000);
+ set_bit(__I40E_IN_REMOVE, pf->state);
if (pf->flags & I40E_FLAG_SRIOV_ENABLED) {
set_bit(__I40E_VF_RESETS_DISABLED, pf->state);
@@ -16138,6 +16152,9 @@ static void i40e_pci_error_reset_done(struct pci_dev *pdev)
{
struct i40e_pf *pf = pci_get_drvdata(pdev);
+ if (test_bit(__I40E_IN_REMOVE, pf->state))
+ return;
+
i40e_reset_and_rebuild(pf, false, false);
}
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 3d2504663c41104b4359a15f35670cfa82de1bbf Mon Sep 17 00:00:00 2001
From: Jedrzej Jagielski <jedrzej.jagielski(a)intel.com>
Date: Tue, 14 Dec 2021 10:08:22 +0000
Subject: [PATCH] i40e: Fix reset bw limit when DCB enabled with 1 TC
There was an AQ error I40E_AQ_RC_EINVAL when trying
to reset bw limit as part of bw allocation setup.
This was caused by trying to reset bw limit with
DCB enabled. Bw limit should not be reset when
DCB is enabled. The code was relying on the pf->flags
to check if DCB is enabled but if only 1 TC is available
this flag will not be set even though DCB is enabled.
Add a check for number of TC and if it is 1
don't try to reset bw limit even if pf->flags shows
DCB as disabled.
Fixes: fa38e30ac73f ("i40e: Fix for Tx timeouts when interface is brought up if DCB is enabled")
Suggested-by: Alexander Lobakin <alexandr.lobakin(a)intel.com> # Flatten the condition
Signed-off-by: Sylwester Dziedziuch <sylwesterx.dziedziuch(a)intel.com>
Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski(a)intel.com>
Reviewed-by: Alexander Lobakin <alexandr.lobakin(a)intel.com>
Tested-by: Imam Hassan Reza Biswas <imam.hassan.reza.biswas(a)intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen(a)intel.com>
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index f70c478dafdb..5cb4dc69fe87 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -5372,7 +5372,15 @@ static int i40e_vsi_configure_bw_alloc(struct i40e_vsi *vsi, u8 enabled_tc,
/* There is no need to reset BW when mqprio mode is on. */
if (pf->flags & I40E_FLAG_TC_MQPRIO)
return 0;
- if (!vsi->mqprio_qopt.qopt.hw && !(pf->flags & I40E_FLAG_DCB_ENABLED)) {
+
+ if (!vsi->mqprio_qopt.qopt.hw) {
+ if (pf->flags & I40E_FLAG_DCB_ENABLED)
+ goto skip_reset;
+
+ if (IS_ENABLED(CONFIG_I40E_DCB) &&
+ i40e_dcb_hw_get_num_tc(&pf->hw) == 1)
+ goto skip_reset;
+
ret = i40e_set_bw_limit(vsi, vsi->seid, 0);
if (ret)
dev_info(&pf->pdev->dev,
@@ -5380,6 +5388,8 @@ static int i40e_vsi_configure_bw_alloc(struct i40e_vsi *vsi, u8 enabled_tc,
vsi->seid);
return ret;
}
+
+skip_reset:
memset(&bw_data, 0, sizeof(bw_data));
bw_data.tc_valid_bits = enabled_tc;
for (i = 0; i < I40E_MAX_TRAFFIC_CLASS; i++)
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 3d2504663c41104b4359a15f35670cfa82de1bbf Mon Sep 17 00:00:00 2001
From: Jedrzej Jagielski <jedrzej.jagielski(a)intel.com>
Date: Tue, 14 Dec 2021 10:08:22 +0000
Subject: [PATCH] i40e: Fix reset bw limit when DCB enabled with 1 TC
There was an AQ error I40E_AQ_RC_EINVAL when trying
to reset bw limit as part of bw allocation setup.
This was caused by trying to reset bw limit with
DCB enabled. Bw limit should not be reset when
DCB is enabled. The code was relying on the pf->flags
to check if DCB is enabled but if only 1 TC is available
this flag will not be set even though DCB is enabled.
Add a check for number of TC and if it is 1
don't try to reset bw limit even if pf->flags shows
DCB as disabled.
Fixes: fa38e30ac73f ("i40e: Fix for Tx timeouts when interface is brought up if DCB is enabled")
Suggested-by: Alexander Lobakin <alexandr.lobakin(a)intel.com> # Flatten the condition
Signed-off-by: Sylwester Dziedziuch <sylwesterx.dziedziuch(a)intel.com>
Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski(a)intel.com>
Reviewed-by: Alexander Lobakin <alexandr.lobakin(a)intel.com>
Tested-by: Imam Hassan Reza Biswas <imam.hassan.reza.biswas(a)intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen(a)intel.com>
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index f70c478dafdb..5cb4dc69fe87 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -5372,7 +5372,15 @@ static int i40e_vsi_configure_bw_alloc(struct i40e_vsi *vsi, u8 enabled_tc,
/* There is no need to reset BW when mqprio mode is on. */
if (pf->flags & I40E_FLAG_TC_MQPRIO)
return 0;
- if (!vsi->mqprio_qopt.qopt.hw && !(pf->flags & I40E_FLAG_DCB_ENABLED)) {
+
+ if (!vsi->mqprio_qopt.qopt.hw) {
+ if (pf->flags & I40E_FLAG_DCB_ENABLED)
+ goto skip_reset;
+
+ if (IS_ENABLED(CONFIG_I40E_DCB) &&
+ i40e_dcb_hw_get_num_tc(&pf->hw) == 1)
+ goto skip_reset;
+
ret = i40e_set_bw_limit(vsi, vsi->seid, 0);
if (ret)
dev_info(&pf->pdev->dev,
@@ -5380,6 +5388,8 @@ static int i40e_vsi_configure_bw_alloc(struct i40e_vsi *vsi, u8 enabled_tc,
vsi->seid);
return ret;
}
+
+skip_reset:
memset(&bw_data, 0, sizeof(bw_data));
bw_data.tc_valid_bits = enabled_tc;
for (i = 0; i < I40E_MAX_TRAFFIC_CLASS; i++)
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 3d2504663c41104b4359a15f35670cfa82de1bbf Mon Sep 17 00:00:00 2001
From: Jedrzej Jagielski <jedrzej.jagielski(a)intel.com>
Date: Tue, 14 Dec 2021 10:08:22 +0000
Subject: [PATCH] i40e: Fix reset bw limit when DCB enabled with 1 TC
There was an AQ error I40E_AQ_RC_EINVAL when trying
to reset bw limit as part of bw allocation setup.
This was caused by trying to reset bw limit with
DCB enabled. Bw limit should not be reset when
DCB is enabled. The code was relying on the pf->flags
to check if DCB is enabled but if only 1 TC is available
this flag will not be set even though DCB is enabled.
Add a check for number of TC and if it is 1
don't try to reset bw limit even if pf->flags shows
DCB as disabled.
Fixes: fa38e30ac73f ("i40e: Fix for Tx timeouts when interface is brought up if DCB is enabled")
Suggested-by: Alexander Lobakin <alexandr.lobakin(a)intel.com> # Flatten the condition
Signed-off-by: Sylwester Dziedziuch <sylwesterx.dziedziuch(a)intel.com>
Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski(a)intel.com>
Reviewed-by: Alexander Lobakin <alexandr.lobakin(a)intel.com>
Tested-by: Imam Hassan Reza Biswas <imam.hassan.reza.biswas(a)intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen(a)intel.com>
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index f70c478dafdb..5cb4dc69fe87 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -5372,7 +5372,15 @@ static int i40e_vsi_configure_bw_alloc(struct i40e_vsi *vsi, u8 enabled_tc,
/* There is no need to reset BW when mqprio mode is on. */
if (pf->flags & I40E_FLAG_TC_MQPRIO)
return 0;
- if (!vsi->mqprio_qopt.qopt.hw && !(pf->flags & I40E_FLAG_DCB_ENABLED)) {
+
+ if (!vsi->mqprio_qopt.qopt.hw) {
+ if (pf->flags & I40E_FLAG_DCB_ENABLED)
+ goto skip_reset;
+
+ if (IS_ENABLED(CONFIG_I40E_DCB) &&
+ i40e_dcb_hw_get_num_tc(&pf->hw) == 1)
+ goto skip_reset;
+
ret = i40e_set_bw_limit(vsi, vsi->seid, 0);
if (ret)
dev_info(&pf->pdev->dev,
@@ -5380,6 +5388,8 @@ static int i40e_vsi_configure_bw_alloc(struct i40e_vsi *vsi, u8 enabled_tc,
vsi->seid);
return ret;
}
+
+skip_reset:
memset(&bw_data, 0, sizeof(bw_data));
bw_data.tc_valid_bits = enabled_tc;
for (i = 0; i < I40E_MAX_TRAFFIC_CLASS; i++)
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 5623ef8a118838aae65363750dfafcba734dc8cb Mon Sep 17 00:00:00 2001
From: Roi Dayan <roid(a)nvidia.com>
Date: Mon, 17 Jan 2022 15:00:30 +0200
Subject: [PATCH] net/mlx5e: TC, Reject rules with forward and drop actions
Such rules are redundant but allowed and passed to the driver.
The driver does not support offloading such rules so return an error.
Fixes: 03a9d11e6eeb ("net/mlx5e: Add TC drop and mirred/redirect action parsing for SRIOV offloads")
Signed-off-by: Roi Dayan <roid(a)nvidia.com>
Reviewed-by: Oz Shlomo <ozsh(a)nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm(a)nvidia.com>
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index 671f76c350db..4c6e3c26c1ab 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -3191,6 +3191,12 @@ actions_match_supported(struct mlx5e_priv *priv,
return false;
}
+ if (!(~actions &
+ (MLX5_FLOW_CONTEXT_ACTION_FWD_DEST | MLX5_FLOW_CONTEXT_ACTION_DROP))) {
+ NL_SET_ERR_MSG_MOD(extack, "Rule cannot support forward+drop action");
+ return false;
+ }
+
if (actions & MLX5_FLOW_CONTEXT_ACTION_MOD_HDR &&
actions & MLX5_FLOW_CONTEXT_ACTION_DROP) {
NL_SET_ERR_MSG_MOD(extack, "Drop with modify header action is not supported");
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 5623ef8a118838aae65363750dfafcba734dc8cb Mon Sep 17 00:00:00 2001
From: Roi Dayan <roid(a)nvidia.com>
Date: Mon, 17 Jan 2022 15:00:30 +0200
Subject: [PATCH] net/mlx5e: TC, Reject rules with forward and drop actions
Such rules are redundant but allowed and passed to the driver.
The driver does not support offloading such rules so return an error.
Fixes: 03a9d11e6eeb ("net/mlx5e: Add TC drop and mirred/redirect action parsing for SRIOV offloads")
Signed-off-by: Roi Dayan <roid(a)nvidia.com>
Reviewed-by: Oz Shlomo <ozsh(a)nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm(a)nvidia.com>
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index 671f76c350db..4c6e3c26c1ab 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -3191,6 +3191,12 @@ actions_match_supported(struct mlx5e_priv *priv,
return false;
}
+ if (!(~actions &
+ (MLX5_FLOW_CONTEXT_ACTION_FWD_DEST | MLX5_FLOW_CONTEXT_ACTION_DROP))) {
+ NL_SET_ERR_MSG_MOD(extack, "Rule cannot support forward+drop action");
+ return false;
+ }
+
if (actions & MLX5_FLOW_CONTEXT_ACTION_MOD_HDR &&
actions & MLX5_FLOW_CONTEXT_ACTION_DROP) {
NL_SET_ERR_MSG_MOD(extack, "Drop with modify header action is not supported");
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 5623ef8a118838aae65363750dfafcba734dc8cb Mon Sep 17 00:00:00 2001
From: Roi Dayan <roid(a)nvidia.com>
Date: Mon, 17 Jan 2022 15:00:30 +0200
Subject: [PATCH] net/mlx5e: TC, Reject rules with forward and drop actions
Such rules are redundant but allowed and passed to the driver.
The driver does not support offloading such rules so return an error.
Fixes: 03a9d11e6eeb ("net/mlx5e: Add TC drop and mirred/redirect action parsing for SRIOV offloads")
Signed-off-by: Roi Dayan <roid(a)nvidia.com>
Reviewed-by: Oz Shlomo <ozsh(a)nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm(a)nvidia.com>
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index 671f76c350db..4c6e3c26c1ab 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -3191,6 +3191,12 @@ actions_match_supported(struct mlx5e_priv *priv,
return false;
}
+ if (!(~actions &
+ (MLX5_FLOW_CONTEXT_ACTION_FWD_DEST | MLX5_FLOW_CONTEXT_ACTION_DROP))) {
+ NL_SET_ERR_MSG_MOD(extack, "Rule cannot support forward+drop action");
+ return false;
+ }
+
if (actions & MLX5_FLOW_CONTEXT_ACTION_MOD_HDR &&
actions & MLX5_FLOW_CONTEXT_ACTION_DROP) {
NL_SET_ERR_MSG_MOD(extack, "Drop with modify header action is not supported");
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 5623ef8a118838aae65363750dfafcba734dc8cb Mon Sep 17 00:00:00 2001
From: Roi Dayan <roid(a)nvidia.com>
Date: Mon, 17 Jan 2022 15:00:30 +0200
Subject: [PATCH] net/mlx5e: TC, Reject rules with forward and drop actions
Such rules are redundant but allowed and passed to the driver.
The driver does not support offloading such rules so return an error.
Fixes: 03a9d11e6eeb ("net/mlx5e: Add TC drop and mirred/redirect action parsing for SRIOV offloads")
Signed-off-by: Roi Dayan <roid(a)nvidia.com>
Reviewed-by: Oz Shlomo <ozsh(a)nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm(a)nvidia.com>
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index 671f76c350db..4c6e3c26c1ab 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -3191,6 +3191,12 @@ actions_match_supported(struct mlx5e_priv *priv,
return false;
}
+ if (!(~actions &
+ (MLX5_FLOW_CONTEXT_ACTION_FWD_DEST | MLX5_FLOW_CONTEXT_ACTION_DROP))) {
+ NL_SET_ERR_MSG_MOD(extack, "Rule cannot support forward+drop action");
+ return false;
+ }
+
if (actions & MLX5_FLOW_CONTEXT_ACTION_MOD_HDR &&
actions & MLX5_FLOW_CONTEXT_ACTION_DROP) {
NL_SET_ERR_MSG_MOD(extack, "Drop with modify header action is not supported");
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 5623ef8a118838aae65363750dfafcba734dc8cb Mon Sep 17 00:00:00 2001
From: Roi Dayan <roid(a)nvidia.com>
Date: Mon, 17 Jan 2022 15:00:30 +0200
Subject: [PATCH] net/mlx5e: TC, Reject rules with forward and drop actions
Such rules are redundant but allowed and passed to the driver.
The driver does not support offloading such rules so return an error.
Fixes: 03a9d11e6eeb ("net/mlx5e: Add TC drop and mirred/redirect action parsing for SRIOV offloads")
Signed-off-by: Roi Dayan <roid(a)nvidia.com>
Reviewed-by: Oz Shlomo <ozsh(a)nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm(a)nvidia.com>
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index 671f76c350db..4c6e3c26c1ab 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -3191,6 +3191,12 @@ actions_match_supported(struct mlx5e_priv *priv,
return false;
}
+ if (!(~actions &
+ (MLX5_FLOW_CONTEXT_ACTION_FWD_DEST | MLX5_FLOW_CONTEXT_ACTION_DROP))) {
+ NL_SET_ERR_MSG_MOD(extack, "Rule cannot support forward+drop action");
+ return false;
+ }
+
if (actions & MLX5_FLOW_CONTEXT_ACTION_MOD_HDR &&
actions & MLX5_FLOW_CONTEXT_ACTION_DROP) {
NL_SET_ERR_MSG_MOD(extack, "Drop with modify header action is not supported");
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a2446bc77a16cefd27de712d28af2396d6287593 Mon Sep 17 00:00:00 2001
From: Roi Dayan <roid(a)nvidia.com>
Date: Tue, 4 Jan 2022 10:38:02 +0200
Subject: [PATCH] net/mlx5e: TC, Reject rules with drop and modify hdr action
This kind of action is not supported by firmware and generates a
syndrome.
kernel: mlx5_core 0000:08:00.0: mlx5_cmd_check:777:(pid 102063): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0x8708c3)
Fixes: d7e75a325cb2 ("net/mlx5e: Add offloading of E-Switch TC pedit (header re-write) actions")
Signed-off-by: Roi Dayan <roid(a)nvidia.com>
Reviewed-by: Oz Shlomo <ozsh(a)nvidia.com>
Reviewed-by: Maor Dickman <maord(a)nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm(a)nvidia.com>
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index 3d908a7e1406..671f76c350db 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -3191,6 +3191,12 @@ actions_match_supported(struct mlx5e_priv *priv,
return false;
}
+ if (actions & MLX5_FLOW_CONTEXT_ACTION_MOD_HDR &&
+ actions & MLX5_FLOW_CONTEXT_ACTION_DROP) {
+ NL_SET_ERR_MSG_MOD(extack, "Drop with modify header action is not supported");
+ return false;
+ }
+
if (actions & MLX5_FLOW_CONTEXT_ACTION_MOD_HDR &&
!modify_header_match_supported(priv, &parse_attr->spec, flow_action,
actions, ct_flow, ct_clear, extack))
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a2446bc77a16cefd27de712d28af2396d6287593 Mon Sep 17 00:00:00 2001
From: Roi Dayan <roid(a)nvidia.com>
Date: Tue, 4 Jan 2022 10:38:02 +0200
Subject: [PATCH] net/mlx5e: TC, Reject rules with drop and modify hdr action
This kind of action is not supported by firmware and generates a
syndrome.
kernel: mlx5_core 0000:08:00.0: mlx5_cmd_check:777:(pid 102063): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0x8708c3)
Fixes: d7e75a325cb2 ("net/mlx5e: Add offloading of E-Switch TC pedit (header re-write) actions")
Signed-off-by: Roi Dayan <roid(a)nvidia.com>
Reviewed-by: Oz Shlomo <ozsh(a)nvidia.com>
Reviewed-by: Maor Dickman <maord(a)nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm(a)nvidia.com>
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index 3d908a7e1406..671f76c350db 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -3191,6 +3191,12 @@ actions_match_supported(struct mlx5e_priv *priv,
return false;
}
+ if (actions & MLX5_FLOW_CONTEXT_ACTION_MOD_HDR &&
+ actions & MLX5_FLOW_CONTEXT_ACTION_DROP) {
+ NL_SET_ERR_MSG_MOD(extack, "Drop with modify header action is not supported");
+ return false;
+ }
+
if (actions & MLX5_FLOW_CONTEXT_ACTION_MOD_HDR &&
!modify_header_match_supported(priv, &parse_attr->spec, flow_action,
actions, ct_flow, ct_clear, extack))
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a2446bc77a16cefd27de712d28af2396d6287593 Mon Sep 17 00:00:00 2001
From: Roi Dayan <roid(a)nvidia.com>
Date: Tue, 4 Jan 2022 10:38:02 +0200
Subject: [PATCH] net/mlx5e: TC, Reject rules with drop and modify hdr action
This kind of action is not supported by firmware and generates a
syndrome.
kernel: mlx5_core 0000:08:00.0: mlx5_cmd_check:777:(pid 102063): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0x8708c3)
Fixes: d7e75a325cb2 ("net/mlx5e: Add offloading of E-Switch TC pedit (header re-write) actions")
Signed-off-by: Roi Dayan <roid(a)nvidia.com>
Reviewed-by: Oz Shlomo <ozsh(a)nvidia.com>
Reviewed-by: Maor Dickman <maord(a)nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm(a)nvidia.com>
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index 3d908a7e1406..671f76c350db 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -3191,6 +3191,12 @@ actions_match_supported(struct mlx5e_priv *priv,
return false;
}
+ if (actions & MLX5_FLOW_CONTEXT_ACTION_MOD_HDR &&
+ actions & MLX5_FLOW_CONTEXT_ACTION_DROP) {
+ NL_SET_ERR_MSG_MOD(extack, "Drop with modify header action is not supported");
+ return false;
+ }
+
if (actions & MLX5_FLOW_CONTEXT_ACTION_MOD_HDR &&
!modify_header_match_supported(priv, &parse_attr->spec, flow_action,
actions, ct_flow, ct_clear, extack))
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a2446bc77a16cefd27de712d28af2396d6287593 Mon Sep 17 00:00:00 2001
From: Roi Dayan <roid(a)nvidia.com>
Date: Tue, 4 Jan 2022 10:38:02 +0200
Subject: [PATCH] net/mlx5e: TC, Reject rules with drop and modify hdr action
This kind of action is not supported by firmware and generates a
syndrome.
kernel: mlx5_core 0000:08:00.0: mlx5_cmd_check:777:(pid 102063): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0x8708c3)
Fixes: d7e75a325cb2 ("net/mlx5e: Add offloading of E-Switch TC pedit (header re-write) actions")
Signed-off-by: Roi Dayan <roid(a)nvidia.com>
Reviewed-by: Oz Shlomo <ozsh(a)nvidia.com>
Reviewed-by: Maor Dickman <maord(a)nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm(a)nvidia.com>
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index 3d908a7e1406..671f76c350db 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -3191,6 +3191,12 @@ actions_match_supported(struct mlx5e_priv *priv,
return false;
}
+ if (actions & MLX5_FLOW_CONTEXT_ACTION_MOD_HDR &&
+ actions & MLX5_FLOW_CONTEXT_ACTION_DROP) {
+ NL_SET_ERR_MSG_MOD(extack, "Drop with modify header action is not supported");
+ return false;
+ }
+
if (actions & MLX5_FLOW_CONTEXT_ACTION_MOD_HDR &&
!modify_header_match_supported(priv, &parse_attr->spec, flow_action,
actions, ct_flow, ct_clear, extack))
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a2446bc77a16cefd27de712d28af2396d6287593 Mon Sep 17 00:00:00 2001
From: Roi Dayan <roid(a)nvidia.com>
Date: Tue, 4 Jan 2022 10:38:02 +0200
Subject: [PATCH] net/mlx5e: TC, Reject rules with drop and modify hdr action
This kind of action is not supported by firmware and generates a
syndrome.
kernel: mlx5_core 0000:08:00.0: mlx5_cmd_check:777:(pid 102063): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0x8708c3)
Fixes: d7e75a325cb2 ("net/mlx5e: Add offloading of E-Switch TC pedit (header re-write) actions")
Signed-off-by: Roi Dayan <roid(a)nvidia.com>
Reviewed-by: Oz Shlomo <ozsh(a)nvidia.com>
Reviewed-by: Maor Dickman <maord(a)nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm(a)nvidia.com>
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index 3d908a7e1406..671f76c350db 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -3191,6 +3191,12 @@ actions_match_supported(struct mlx5e_priv *priv,
return false;
}
+ if (actions & MLX5_FLOW_CONTEXT_ACTION_MOD_HDR &&
+ actions & MLX5_FLOW_CONTEXT_ACTION_DROP) {
+ NL_SET_ERR_MSG_MOD(extack, "Drop with modify header action is not supported");
+ return false;
+ }
+
if (actions & MLX5_FLOW_CONTEXT_ACTION_MOD_HDR &&
!modify_header_match_supported(priv, &parse_attr->spec, flow_action,
actions, ct_flow, ct_clear, extack))
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 5352859b3bfa0ca188b2f1d2c1436fddc781e3b6 Mon Sep 17 00:00:00 2001
From: Raed Salem <raeds(a)nvidia.com>
Date: Thu, 2 Dec 2021 17:43:50 +0200
Subject: [PATCH] net/mlx5e: IPsec: Fix crypto offload for non TCP/UDP
encapsulated traffic
IPsec crypto offload always set the ethernet segment checksum flags with
the inner L4 header checksum flag enabled for encapsulated IPsec offloaded
packet regardless of the encapsulated L4 header type, and even if it
doesn't exists in the first place, this breaks non TCP/UDP traffic as
such.
Set the inner L4 checksum flag only when the encapsulated L4 header
protocol is TCP/UDP using software parser swp_inner_l4_offset field as
indication.
Fixes: 5cfb540ef27b ("net/mlx5e: Set IPsec WAs only in IP's non checksum partial case.")
Signed-off-by: Raed Salem <raeds(a)nvidia.com>
Reviewed-by: Maor Dickman <maord(a)nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm(a)nvidia.com>
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_rxtx.h b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_rxtx.h
index b98db50c3418..428881e0adcb 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_rxtx.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_rxtx.h
@@ -131,14 +131,17 @@ static inline bool
mlx5e_ipsec_txwqe_build_eseg_csum(struct mlx5e_txqsq *sq, struct sk_buff *skb,
struct mlx5_wqe_eth_seg *eseg)
{
- struct xfrm_offload *xo = xfrm_offload(skb);
+ u8 inner_ipproto;
if (!mlx5e_ipsec_eseg_meta(eseg))
return false;
eseg->cs_flags = MLX5_ETH_WQE_L3_CSUM;
- if (xo->inner_ipproto) {
- eseg->cs_flags |= MLX5_ETH_WQE_L4_INNER_CSUM | MLX5_ETH_WQE_L3_INNER_CSUM;
+ inner_ipproto = xfrm_offload(skb)->inner_ipproto;
+ if (inner_ipproto) {
+ eseg->cs_flags |= MLX5_ETH_WQE_L3_INNER_CSUM;
+ if (inner_ipproto == IPPROTO_TCP || inner_ipproto == IPPROTO_UDP)
+ eseg->cs_flags |= MLX5_ETH_WQE_L4_INNER_CSUM;
} else if (likely(skb->ip_summed == CHECKSUM_PARTIAL)) {
eseg->cs_flags |= MLX5_ETH_WQE_L4_CSUM;
sq->stats->csum_partial_inner++;
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 5352859b3bfa0ca188b2f1d2c1436fddc781e3b6 Mon Sep 17 00:00:00 2001
From: Raed Salem <raeds(a)nvidia.com>
Date: Thu, 2 Dec 2021 17:43:50 +0200
Subject: [PATCH] net/mlx5e: IPsec: Fix crypto offload for non TCP/UDP
encapsulated traffic
IPsec crypto offload always set the ethernet segment checksum flags with
the inner L4 header checksum flag enabled for encapsulated IPsec offloaded
packet regardless of the encapsulated L4 header type, and even if it
doesn't exists in the first place, this breaks non TCP/UDP traffic as
such.
Set the inner L4 checksum flag only when the encapsulated L4 header
protocol is TCP/UDP using software parser swp_inner_l4_offset field as
indication.
Fixes: 5cfb540ef27b ("net/mlx5e: Set IPsec WAs only in IP's non checksum partial case.")
Signed-off-by: Raed Salem <raeds(a)nvidia.com>
Reviewed-by: Maor Dickman <maord(a)nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm(a)nvidia.com>
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_rxtx.h b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_rxtx.h
index b98db50c3418..428881e0adcb 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_rxtx.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_rxtx.h
@@ -131,14 +131,17 @@ static inline bool
mlx5e_ipsec_txwqe_build_eseg_csum(struct mlx5e_txqsq *sq, struct sk_buff *skb,
struct mlx5_wqe_eth_seg *eseg)
{
- struct xfrm_offload *xo = xfrm_offload(skb);
+ u8 inner_ipproto;
if (!mlx5e_ipsec_eseg_meta(eseg))
return false;
eseg->cs_flags = MLX5_ETH_WQE_L3_CSUM;
- if (xo->inner_ipproto) {
- eseg->cs_flags |= MLX5_ETH_WQE_L4_INNER_CSUM | MLX5_ETH_WQE_L3_INNER_CSUM;
+ inner_ipproto = xfrm_offload(skb)->inner_ipproto;
+ if (inner_ipproto) {
+ eseg->cs_flags |= MLX5_ETH_WQE_L3_INNER_CSUM;
+ if (inner_ipproto == IPPROTO_TCP || inner_ipproto == IPPROTO_UDP)
+ eseg->cs_flags |= MLX5_ETH_WQE_L4_INNER_CSUM;
} else if (likely(skb->ip_summed == CHECKSUM_PARTIAL)) {
eseg->cs_flags |= MLX5_ETH_WQE_L4_CSUM;
sq->stats->csum_partial_inner++;
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 24f6008564183aa120d07c03d9289519c2fe02af Mon Sep 17 00:00:00 2001
From: "Eric W. Biederman" <ebiederm(a)xmission.com>
Date: Thu, 20 Jan 2022 11:04:01 -0600
Subject: [PATCH] cgroup-v1: Require capabilities to set release_agent
The cgroup release_agent is called with call_usermodehelper. The function
call_usermodehelper starts the release_agent with a full set fo capabilities.
Therefore require capabilities when setting the release_agaent.
Reported-by: Tabitha Sable <tabitha.c.sable(a)gmail.com>
Tested-by: Tabitha Sable <tabitha.c.sable(a)gmail.com>
Fixes: 81a6a5cdd2c5 ("Task Control Groups: automatic userspace notification of idle cgroups")
Cc: stable(a)vger.kernel.org # v2.6.24+
Signed-off-by: "Eric W. Biederman" <ebiederm(a)xmission.com>
Signed-off-by: Tejun Heo <tj(a)kernel.org>
diff --git a/kernel/cgroup/cgroup-v1.c b/kernel/cgroup/cgroup-v1.c
index 41e0837a5a0b..0e877dbcfeea 100644
--- a/kernel/cgroup/cgroup-v1.c
+++ b/kernel/cgroup/cgroup-v1.c
@@ -549,6 +549,14 @@ static ssize_t cgroup_release_agent_write(struct kernfs_open_file *of,
BUILD_BUG_ON(sizeof(cgrp->root->release_agent_path) < PATH_MAX);
+ /*
+ * Release agent gets called with all capabilities,
+ * require capabilities to set release agent.
+ */
+ if ((of->file->f_cred->user_ns != &init_user_ns) ||
+ !capable(CAP_SYS_ADMIN))
+ return -EPERM;
+
cgrp = cgroup_kn_lock_live(of->kn, false);
if (!cgrp)
return -ENODEV;
@@ -954,6 +962,12 @@ int cgroup1_parse_param(struct fs_context *fc, struct fs_parameter *param)
/* Specifying two release agents is forbidden */
if (ctx->release_agent)
return invalfc(fc, "release_agent respecified");
+ /*
+ * Release agent gets called with all capabilities,
+ * require capabilities to set release agent.
+ */
+ if ((fc->user_ns != &init_user_ns) || !capable(CAP_SYS_ADMIN))
+ return invalfc(fc, "Setting release_agent not allowed");
ctx->release_agent = param->string;
param->string = NULL;
break;
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 24f6008564183aa120d07c03d9289519c2fe02af Mon Sep 17 00:00:00 2001
From: "Eric W. Biederman" <ebiederm(a)xmission.com>
Date: Thu, 20 Jan 2022 11:04:01 -0600
Subject: [PATCH] cgroup-v1: Require capabilities to set release_agent
The cgroup release_agent is called with call_usermodehelper. The function
call_usermodehelper starts the release_agent with a full set fo capabilities.
Therefore require capabilities when setting the release_agaent.
Reported-by: Tabitha Sable <tabitha.c.sable(a)gmail.com>
Tested-by: Tabitha Sable <tabitha.c.sable(a)gmail.com>
Fixes: 81a6a5cdd2c5 ("Task Control Groups: automatic userspace notification of idle cgroups")
Cc: stable(a)vger.kernel.org # v2.6.24+
Signed-off-by: "Eric W. Biederman" <ebiederm(a)xmission.com>
Signed-off-by: Tejun Heo <tj(a)kernel.org>
diff --git a/kernel/cgroup/cgroup-v1.c b/kernel/cgroup/cgroup-v1.c
index 41e0837a5a0b..0e877dbcfeea 100644
--- a/kernel/cgroup/cgroup-v1.c
+++ b/kernel/cgroup/cgroup-v1.c
@@ -549,6 +549,14 @@ static ssize_t cgroup_release_agent_write(struct kernfs_open_file *of,
BUILD_BUG_ON(sizeof(cgrp->root->release_agent_path) < PATH_MAX);
+ /*
+ * Release agent gets called with all capabilities,
+ * require capabilities to set release agent.
+ */
+ if ((of->file->f_cred->user_ns != &init_user_ns) ||
+ !capable(CAP_SYS_ADMIN))
+ return -EPERM;
+
cgrp = cgroup_kn_lock_live(of->kn, false);
if (!cgrp)
return -ENODEV;
@@ -954,6 +962,12 @@ int cgroup1_parse_param(struct fs_context *fc, struct fs_parameter *param)
/* Specifying two release agents is forbidden */
if (ctx->release_agent)
return invalfc(fc, "release_agent respecified");
+ /*
+ * Release agent gets called with all capabilities,
+ * require capabilities to set release agent.
+ */
+ if ((fc->user_ns != &init_user_ns) || !capable(CAP_SYS_ADMIN))
+ return invalfc(fc, "Setting release_agent not allowed");
ctx->release_agent = param->string;
param->string = NULL;
break;
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 24f6008564183aa120d07c03d9289519c2fe02af Mon Sep 17 00:00:00 2001
From: "Eric W. Biederman" <ebiederm(a)xmission.com>
Date: Thu, 20 Jan 2022 11:04:01 -0600
Subject: [PATCH] cgroup-v1: Require capabilities to set release_agent
The cgroup release_agent is called with call_usermodehelper. The function
call_usermodehelper starts the release_agent with a full set fo capabilities.
Therefore require capabilities when setting the release_agaent.
Reported-by: Tabitha Sable <tabitha.c.sable(a)gmail.com>
Tested-by: Tabitha Sable <tabitha.c.sable(a)gmail.com>
Fixes: 81a6a5cdd2c5 ("Task Control Groups: automatic userspace notification of idle cgroups")
Cc: stable(a)vger.kernel.org # v2.6.24+
Signed-off-by: "Eric W. Biederman" <ebiederm(a)xmission.com>
Signed-off-by: Tejun Heo <tj(a)kernel.org>
diff --git a/kernel/cgroup/cgroup-v1.c b/kernel/cgroup/cgroup-v1.c
index 41e0837a5a0b..0e877dbcfeea 100644
--- a/kernel/cgroup/cgroup-v1.c
+++ b/kernel/cgroup/cgroup-v1.c
@@ -549,6 +549,14 @@ static ssize_t cgroup_release_agent_write(struct kernfs_open_file *of,
BUILD_BUG_ON(sizeof(cgrp->root->release_agent_path) < PATH_MAX);
+ /*
+ * Release agent gets called with all capabilities,
+ * require capabilities to set release agent.
+ */
+ if ((of->file->f_cred->user_ns != &init_user_ns) ||
+ !capable(CAP_SYS_ADMIN))
+ return -EPERM;
+
cgrp = cgroup_kn_lock_live(of->kn, false);
if (!cgrp)
return -ENODEV;
@@ -954,6 +962,12 @@ int cgroup1_parse_param(struct fs_context *fc, struct fs_parameter *param)
/* Specifying two release agents is forbidden */
if (ctx->release_agent)
return invalfc(fc, "release_agent respecified");
+ /*
+ * Release agent gets called with all capabilities,
+ * require capabilities to set release agent.
+ */
+ if ((fc->user_ns != &init_user_ns) || !capable(CAP_SYS_ADMIN))
+ return invalfc(fc, "Setting release_agent not allowed");
ctx->release_agent = param->string;
param->string = NULL;
break;
I NEED TRUST.
Hope you are in good health with your family.
I am Mr.Komi Zongo. I work as the Foreign Operations Manager with
one of the international banks here in Burkina Faso. Although the
world is a very small place and hard place to meet people because you
don't know who to trust or believe, as I have developed trust in you
after my fasting and praying, I made up my mind to confide this
confidential business suggestion to you.
There is an overdue unclaimed sum of Ten Million Five Hundred Thousand
United States Dollars ($10,500,000.00) in my bank, belonging to one of
our dead foreign customers. There were no beneficiaries stated
concerning these funds. Therefore, your request as a foreigner is
necessary to apply for the claim and release of the fund smoothly into
your reliable bank account as the Foreign Business Partner to the
deceased.
On the transfer of this fund in your account, you will take 40% as
your share from the total fund, 5% will be shared to Charitable
Organizations while Motherless Babies homes, disabled helpless as the
balance of 55% will be for me. If you are really sure of your
integrity, trustworthy, and confidentiality, reply urgently and to
prove that, include your particulars as follows.
Please get back to me through this Email Address komizongo2020(a)gmail.com
please fill in your personal information as indicated below and as
soon as i receive this information below i will forward you a text of an
application which you will fill and send to the bank for the claim of the
fund as i will direct you on what to do.
Your name in full.......................... ........
Your country....................... ..................
Your age........................... ....................
Your cell phone......................... ...........
Your occupation.................... ...............
Your sex........................... ....................
Your marital status........................ .......
Your id card or passport...........................
Best Regards,
Mr.Komi Zongo.
--
Hello Dear,
how are you today?hope you are fine
My name is Dr Ava Smith ,Am an English and French nationalities.
I will give you pictures and more details about me as soon as i hear from you
Thanks
Ava
This is the start of the stable review cycle for the 4.14.264 release.
There are 2 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Sat, 29 Jan 2022 18:02:51 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.264-r…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.14.264-rc1
Ziyang Xuan <william.xuanziyang(a)huawei.com>
can: bcm: fix UAF of bcm op
Tvrtko Ursulin <tvrtko.ursulin(a)intel.com>
drm/i915: Flush TLBs before releasing backing store
-------------
Diffstat:
Makefile | 4 +-
drivers/gpu/drm/i915/i915_drv.h | 2 +
drivers/gpu/drm/i915/i915_gem.c | 84 +++++++++++++++++++++++++++++++++-
drivers/gpu/drm/i915/i915_gem_object.h | 1 +
drivers/gpu/drm/i915/i915_reg.h | 6 +++
drivers/gpu/drm/i915/i915_vma.c | 4 ++
net/can/bcm.c | 20 ++++----
7 files changed, 108 insertions(+), 13 deletions(-)
From: Lang Yu <lang.yu(a)amd.com>
Subject: mm/kmemleak: avoid scanning potential huge holes
When using devm_request_free_mem_region() and devm_memremap_pages() to add
ZONE_DEVICE memory, if requested free mem region's end pfn were huge(e.g.,
0x400000000), the node_end_pfn() will be also huge (see
move_pfn_range_to_zone()). Thus it creates a huge hole between
node_start_pfn() and node_end_pfn().
We found on some AMD APUs, amdkfd requested such a free mem region and
created a huge hole. In such a case, following code snippet was just
doing busy test_bit() looping on the huge hole.
for (pfn = start_pfn; pfn < end_pfn; pfn++) {
struct page *page = pfn_to_online_page(pfn);
if (!page)
continue;
...
}
So we got a soft lockup:
watchdog: BUG: soft lockup - CPU#6 stuck for 26s! [bash:1221]
CPU: 6 PID: 1221 Comm: bash Not tainted 5.15.0-custom #1
RIP: 0010:pfn_to_online_page+0x5/0xd0
Call Trace:
? kmemleak_scan+0x16a/0x440
kmemleak_write+0x306/0x3a0
? common_file_perm+0x72/0x170
full_proxy_write+0x5c/0x90
vfs_write+0xb9/0x260
ksys_write+0x67/0xe0
__x64_sys_write+0x1a/0x20
do_syscall_64+0x3b/0xc0
entry_SYSCALL_64_after_hwframe+0x44/0xae
I did some tests with the patch.
(1) amdgpu module unloaded
before the patch:
real 0m0.976s
user 0m0.000s
sys 0m0.968s
after the patch:
real 0m0.981s
user 0m0.000s
sys 0m0.973s
(2) amdgpu module loaded
before the patch:
real 0m35.365s
user 0m0.000s
sys 0m35.354s
after the patch:
real 0m1.049s
user 0m0.000s
sys 0m1.042s
Link: https://lkml.kernel.org/r/20211108140029.721144-1-lang.yu@amd.com
Signed-off-by: Lang Yu <lang.yu(a)amd.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Acked-by: Catalin Marinas <catalin.marinas(a)arm.com>
Cc: Oscar Salvador <osalvador(a)suse.de>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/kmemleak.c | 13 +++++++------
1 file changed, 7 insertions(+), 6 deletions(-)
--- a/mm/kmemleak.c~mm-kmemleak-avoid-scanning-potential-huge-holes
+++ a/mm/kmemleak.c
@@ -1410,7 +1410,8 @@ static void kmemleak_scan(void)
{
unsigned long flags;
struct kmemleak_object *object;
- int i;
+ struct zone *zone;
+ int __maybe_unused i;
int new_leaks = 0;
jiffies_last_scan = jiffies;
@@ -1450,9 +1451,9 @@ static void kmemleak_scan(void)
* Struct page scanning for each node.
*/
get_online_mems();
- for_each_online_node(i) {
- unsigned long start_pfn = node_start_pfn(i);
- unsigned long end_pfn = node_end_pfn(i);
+ for_each_populated_zone(zone) {
+ unsigned long start_pfn = zone->zone_start_pfn;
+ unsigned long end_pfn = zone_end_pfn(zone);
unsigned long pfn;
for (pfn = start_pfn; pfn < end_pfn; pfn++) {
@@ -1461,8 +1462,8 @@ static void kmemleak_scan(void)
if (!page)
continue;
- /* only scan pages belonging to this node */
- if (page_to_nid(page) != i)
+ /* only scan pages belonging to this zone */
+ if (page_zone(page) != zone)
continue;
/* only scan if page is in use */
if (page_count(page) == 0)
_
From: Mike Rapoport <rppt(a)linux.ibm.com>
Subject: mm/pgtable: define pte_index so that preprocessor could recognize it
Since commit 974b9b2c68f3 ("mm: consolidate pte_index() and pte_offset_*()
definitions") pte_index is a static inline and there is no define for it
that can be recognized by the preprocessor. As a result,
vm_insert_pages() uses slower loop over vm_insert_page() instead of
insert_pages() that amortizes the cost of spinlock operations when
inserting multiple pages.
Link: https://lkml.kernel.org/r/20220111145457.20748-1-rppt@kernel.org
Fixes: 974b9b2c68f3 ("mm: consolidate pte_index() and pte_offset_*() definitions")
Signed-off-by: Mike Rapoport <rppt(a)linux.ibm.com>
Reported-by: Christian Dietrich <stettberger(a)dokucode.de>
Reviewed-by: Khalid Aziz <khalid.aziz(a)oracle.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/pgtable.h | 1 +
1 file changed, 1 insertion(+)
--- a/include/linux/pgtable.h~mm-pgtable-define-pte_index-so-that-preprocessor-could-recognize-it
+++ a/include/linux/pgtable.h
@@ -62,6 +62,7 @@ static inline unsigned long pte_index(un
{
return (address >> PAGE_SHIFT) & (PTRS_PER_PTE - 1);
}
+#define pte_index pte_index
#ifndef pmd_index
static inline unsigned long pmd_index(unsigned long address)
_
From: Pasha Tatashin <pasha.tatashin(a)soleen.com>
Subject: mm/debug_vm_pgtable: remove pte entry from the page table
Patch series "page table check fixes and cleanups", v5.
This patch (of 4):
The pte entry that is used in pte_advanced_tests() is never removed from
the page table at the end of the test.
The issue is detected by page_table_check, to repro compile kernel with
the following configs:
CONFIG_DEBUG_VM_PGTABLE=y
CONFIG_PAGE_TABLE_CHECK=y
CONFIG_PAGE_TABLE_CHECK_ENFORCED=y
During the boot the following BUG is printed:
[ 2.262821] debug_vm_pgtable: [debug_vm_pgtable ]: Validating
architecture page table helpers
[ 2.276826] ------------[ cut here ]------------
[ 2.280426] kernel BUG at mm/page_table_check.c:162!
[ 2.284118] invalid opcode: 0000 [#1] PREEMPT SMP PTI
[ 2.287787] CPU: 0 PID: 1 Comm: swapper/0 Not tainted
5.16.0-11413-g2c271fe77d52 #3
[ 2.293226] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org
04/01/2014
...
The entry should be properly removed from the page table before the page
is released to the free list.
Link: https://lkml.kernel.org/r/20220131203249.2832273-1-pasha.tatashin@soleen.com
Link: https://lkml.kernel.org/r/20220131203249.2832273-2-pasha.tatashin@soleen.com
Fixes: a5c3b9ffb0f4 ("mm/debug_vm_pgtable: add tests validating advanced arch page table helpers")
Signed-off-by: Pasha Tatashin <pasha.tatashin(a)soleen.com>
Reviewed-by: Zi Yan <ziy(a)nvidia.com>
Tested-by: Zi Yan <ziy(a)nvidia.com>
Acked-by: David Rientjes <rientjes(a)google.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual(a)arm.com>
Cc: Paul Turner <pjt(a)google.com>
Cc: Wei Xu <weixugc(a)google.com>
Cc: Greg Thelen <gthelen(a)google.com>
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Will Deacon <will(a)kernel.org>
Cc: Mike Rapoport <rppt(a)kernel.org>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: H. Peter Anvin <hpa(a)zytor.com>
Cc: Aneesh Kumar K.V <aneesh.kumar(a)linux.ibm.com>
Cc: Jiri Slaby <jirislaby(a)kernel.org>
Cc: Muchun Song <songmuchun(a)bytedance.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: <stable(a)vger.kernel.org> [5.9+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/debug_vm_pgtable.c | 2 ++
1 file changed, 2 insertions(+)
--- a/mm/debug_vm_pgtable.c~mm-debug_vm_pgtable-remove-pte-entry-from-the-page-table
+++ a/mm/debug_vm_pgtable.c
@@ -171,6 +171,8 @@ static void __init pte_advanced_tests(st
ptep_test_and_clear_young(args->vma, args->vaddr, args->ptep);
pte = ptep_get(args->ptep);
WARN_ON(pte_young(pte));
+
+ ptep_get_and_clear_full(args->mm, args->vaddr, args->ptep, 1);
}
static void __init pte_savedwrite_tests(struct pgtable_debug_args *args)
_
Commit 309a62fa3a9e ("bio-integrity: bio_integrity_advance must update
integrity seed") added code to update the integrity seed value when
advancing a bio. However, it failed to take into account that the
integrity interval might be larger than the 512-byte block layer
sector size. This broke bio splitting on PI devices with 4KB logical
blocks.
The seed value should be advanced by bio_integrity_intervals() and not
the number of sectors.
Cc: Dmitry Monakhov <dmonakhov(a)openvz.org>
Cc: stable(a)vger.kernel.org
Fixes: 309a62fa3a9e ("bio-integrity: bio_integrity_advance must update integrity seed")
Tested-by: Dmitry Ivanov <dmitry.ivanov2(a)hpe.com>
Reported-by: Alexey Lyashkov <alexey.lyashkov(a)hpe.com>
Signed-off-by: Martin K. Petersen <martin.petersen(a)oracle.com>
---
block/bio-integrity.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/block/bio-integrity.c b/block/bio-integrity.c
index d25114715459..0827b19820c5 100644
--- a/block/bio-integrity.c
+++ b/block/bio-integrity.c
@@ -373,7 +373,7 @@ void bio_integrity_advance(struct bio *bio, unsigned int bytes_done)
struct blk_integrity *bi = blk_get_integrity(bio->bi_bdev->bd_disk);
unsigned bytes = bio_integrity_bytes(bi, bytes_done >> 9);
- bip->bip_iter.bi_sector += bytes_done >> 9;
+ bip->bip_iter.bi_sector += bio_integrity_intervals(bi, bytes_done >> 9);
bvec_iter_advance(bip->bip_vec, &bip->bip_iter, bytes);
}
--
2.32.0
As Pavel Machek reported in below link [1]:
After commit 77900c45ee5c ("f2fs: fix to do sanity check in is_alive()"),
node page should be unlock via calling f2fs_put_page() in the error path
of is_alive(), otherwise, f2fs may hang when it tries to lock the node
page, fix it.
[1] https://lore.kernel.org/stable/20220124203637.GA19321@duo.ucw.cz/
Fixes: 77900c45ee5c ("f2fs: fix to do sanity check in is_alive()")
Cc: <stable(a)vger.kernel.org>
Reported-by: Pavel Machek <pavel(a)denx.de>
Signed-off-by: Pavel Machek <pavel(a)denx.de>
Signed-off-by: Chao Yu <chao(a)kernel.org>
---
v2:
- Cc stable-kernel mailing list.
fs/f2fs/gc.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
index 0a6b0a8ae97e..2d53ef121e76 100644
--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -1038,8 +1038,10 @@ static bool is_alive(struct f2fs_sb_info *sbi, struct f2fs_summary *sum,
set_sbi_flag(sbi, SBI_NEED_FSCK);
}
- if (f2fs_check_nid_range(sbi, dni->ino))
+ if (f2fs_check_nid_range(sbi, dni->ino)) {
+ f2fs_put_page(node_page, 1);
return false;
+ }
*nofs = ofs_of_node(node_page);
source_blkaddr = data_blkaddr(NULL, node_page, ofs_in_node);
--
2.32.0
I’m Mr. Mathieu a banker working with a commercial bank as a Chief
Financial Officer. I have a business deal for our mutual benefit. One
of our customers died few years ago without any registered next-of-kin
in his bio-data/account file which is peculiar with high profile
investors. I think he was your relative because you both share the
same family's name. Right now my bank wants to sit on this money (US$
13 .5 Million) as unclaimed and abandoned funds which I do not want
because of reasons I will explain to you when I receive your response.
Please get back to me as fast as you can with your mobile telephone
number for more details on how the funds can be turned into yours by
right of inheritance for us to claim and share equally afterwards;
Best Regards, I am urgently expecting your reply.
Yours,
Mr. Mathieu
The patch titled
Subject: mm: memcg: synchronize objcg lists with a dedicated spinlock
has been added to the -mm tree. Its filename is
mm-memcg-synchronize-objcg-lists-with-a-dedicated-spinlock.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/mm-memcg-synchronize-objcg-lists-…
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/mm-memcg-synchronize-objcg-lists-…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Roman Gushchin <guro(a)fb.com>
Subject: mm: memcg: synchronize objcg lists with a dedicated spinlock
Alexander reported a circular lock dependency revealed by the mmap1 ltp
test:
LOCKDEP_CIRCULAR (suite: ltp, case: mtest06 (mmap1))
WARNING: possible circular locking dependency detected
5.17.0-20220113.rc0.git0.f2211f194038.300.fc35.s390x+debug #1 Not tainted
------------------------------------------------------
mmap1/202299 is trying to acquire lock:
00000001892c0188 (css_set_lock){..-.}-{2:2}, at: obj_cgroup_release+0x4a/0xe0
but task is already holding lock:
00000000ca3b3818 (&sighand->siglock){-.-.}-{2:2}, at: force_sig_info_to_task+0x38/0x180
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (&sighand->siglock){-.-.}-{2:2}:
__lock_acquire+0x604/0xbd8
lock_acquire.part.0+0xe2/0x238
lock_acquire+0xb0/0x200
_raw_spin_lock_irqsave+0x6a/0xd8
__lock_task_sighand+0x90/0x190
cgroup_freeze_task+0x2e/0x90
cgroup_migrate_execute+0x11c/0x608
cgroup_update_dfl_csses+0x246/0x270
cgroup_subtree_control_write+0x238/0x518
kernfs_fop_write_iter+0x13e/0x1e0
new_sync_write+0x100/0x190
vfs_write+0x22c/0x2d8
ksys_write+0x6c/0xf8
__do_syscall+0x1da/0x208
system_call+0x82/0xb0
-> #0 (css_set_lock){..-.}-{2:2}:
check_prev_add+0xe0/0xed8
validate_chain+0x736/0xb20
__lock_acquire+0x604/0xbd8
lock_acquire.part.0+0xe2/0x238
lock_acquire+0xb0/0x200
_raw_spin_lock_irqsave+0x6a/0xd8
obj_cgroup_release+0x4a/0xe0
percpu_ref_put_many.constprop.0+0x150/0x168
drain_obj_stock+0x94/0xe8
refill_obj_stock+0x94/0x278
obj_cgroup_charge+0x164/0x1d8
kmem_cache_alloc+0xac/0x528
__sigqueue_alloc+0x150/0x308
__send_signal+0x260/0x550
send_signal+0x7e/0x348
force_sig_info_to_task+0x104/0x180
force_sig_fault+0x48/0x58
__do_pgm_check+0x120/0x1f0
pgm_check_handler+0x11e/0x180
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&sighand->siglock);
lock(css_set_lock);
lock(&sighand->siglock);
lock(css_set_lock);
*** DEADLOCK ***
2 locks held by mmap1/202299:
#0: 00000000ca3b3818 (&sighand->siglock){-.-.}-{2:2}, at: force_sig_info_to_task+0x38/0x180
#1: 00000001892ad560 (rcu_read_lock){....}-{1:2}, at: percpu_ref_put_many.constprop.0+0x0/0x168
stack backtrace:
CPU: 15 PID: 202299 Comm: mmap1 Not tainted 5.17.0-20220113.rc0.git0.f2211f194038.300.fc35.s390x+debug #1
Hardware name: IBM 3906 M04 704 (LPAR)
Call Trace:
[<00000001888aacfe>] dump_stack_lvl+0x76/0x98
[<0000000187c6d7be>] check_noncircular+0x136/0x158
[<0000000187c6e888>] check_prev_add+0xe0/0xed8
[<0000000187c6fdb6>] validate_chain+0x736/0xb20
[<0000000187c71e54>] __lock_acquire+0x604/0xbd8
[<0000000187c7301a>] lock_acquire.part.0+0xe2/0x238
[<0000000187c73220>] lock_acquire+0xb0/0x200
[<00000001888bf9aa>] _raw_spin_lock_irqsave+0x6a/0xd8
[<0000000187ef6862>] obj_cgroup_release+0x4a/0xe0
[<0000000187ef6498>] percpu_ref_put_many.constprop.0+0x150/0x168
[<0000000187ef9674>] drain_obj_stock+0x94/0xe8
[<0000000187efa464>] refill_obj_stock+0x94/0x278
[<0000000187eff55c>] obj_cgroup_charge+0x164/0x1d8
[<0000000187ed8aa4>] kmem_cache_alloc+0xac/0x528
[<0000000187bf2eb8>] __sigqueue_alloc+0x150/0x308
[<0000000187bf4210>] __send_signal+0x260/0x550
[<0000000187bf5f06>] send_signal+0x7e/0x348
[<0000000187bf7274>] force_sig_info_to_task+0x104/0x180
[<0000000187bf7758>] force_sig_fault+0x48/0x58
[<00000001888ae160>] __do_pgm_check+0x120/0x1f0
[<00000001888c0cde>] pgm_check_handler+0x11e/0x180
INFO: lockdep is turned off.
In this example a slab allocation from __send_signal() caused a refilling
and draining of a percpu objcg stock, resulted in a releasing of another
non-related objcg. Objcg release path requires taking the css_set_lock,
which is used to synchronize objcg lists.
This can create a circular dependency with the sighandler lock, which is
taken with the locked css_set_lock by the freezer code (to freeze a task).
In general it seems that using css_set_lock to synchronize objcg lists
makes any slab allocations and deallocation with the locked css_set_lock
and any intervened locks risky.
To fix the problem and make the code more robust let's stop using
css_set_lock to synchronize objcg lists and use a new dedicated spinlock
instead.
Link: https://lkml.kernel.org/r/Yfm1IHmoGdyUR81T@carbon.dhcp.thefacebook.com
Fixes: bf4f059954dc ("mm: memcg/slab: obj_cgroup API")
Signed-off-by: Roman Gushchin <guro(a)fb.com>
Reported-by: Alexander Egorenkov <egorenar(a)linux.ibm.com>
Tested-by: Alexander Egorenkov <egorenar(a)linux.ibm.com>
Reviewed-by: Waiman Long <longman(a)redhat.com>
Acked-by: Tejun Heo <tj(a)kernel.org>
Reviewed-by: Shakeel Butt <shakeelb(a)google.com>
Reviewed-by: Jeremy Linton <jeremy.linton(a)arm.com>
Tested-by: Jeremy Linton <jeremy.linton(a)arm.com>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/memcontrol.h | 5 +++--
mm/memcontrol.c | 10 +++++-----
2 files changed, 8 insertions(+), 7 deletions(-)
--- a/include/linux/memcontrol.h~mm-memcg-synchronize-objcg-lists-with-a-dedicated-spinlock
+++ a/include/linux/memcontrol.h
@@ -219,7 +219,7 @@ struct obj_cgroup {
struct mem_cgroup *memcg;
atomic_t nr_charged_bytes;
union {
- struct list_head list;
+ struct list_head list; /* protected by objcg_lock */
struct rcu_head rcu;
};
};
@@ -315,7 +315,8 @@ struct mem_cgroup {
#ifdef CONFIG_MEMCG_KMEM
int kmemcg_id;
struct obj_cgroup __rcu *objcg;
- struct list_head objcg_list; /* list of inherited objcgs */
+ /* list of inherited objcgs, protected by objcg_lock */
+ struct list_head objcg_list;
#endif
MEMCG_PADDING(_pad2_);
--- a/mm/memcontrol.c~mm-memcg-synchronize-objcg-lists-with-a-dedicated-spinlock
+++ a/mm/memcontrol.c
@@ -254,7 +254,7 @@ struct mem_cgroup *vmpressure_to_memcg(s
}
#ifdef CONFIG_MEMCG_KMEM
-extern spinlock_t css_set_lock;
+static DEFINE_SPINLOCK(objcg_lock);
bool mem_cgroup_kmem_disabled(void)
{
@@ -298,9 +298,9 @@ static void obj_cgroup_release(struct pe
if (nr_pages)
obj_cgroup_uncharge_pages(objcg, nr_pages);
- spin_lock_irqsave(&css_set_lock, flags);
+ spin_lock_irqsave(&objcg_lock, flags);
list_del(&objcg->list);
- spin_unlock_irqrestore(&css_set_lock, flags);
+ spin_unlock_irqrestore(&objcg_lock, flags);
percpu_ref_exit(ref);
kfree_rcu(objcg, rcu);
@@ -332,7 +332,7 @@ static void memcg_reparent_objcgs(struct
objcg = rcu_replace_pointer(memcg->objcg, NULL, true);
- spin_lock_irq(&css_set_lock);
+ spin_lock_irq(&objcg_lock);
/* 1) Ready to reparent active objcg. */
list_add(&objcg->list, &memcg->objcg_list);
@@ -342,7 +342,7 @@ static void memcg_reparent_objcgs(struct
/* 3) Move already reparented objcgs to the parent's list */
list_splice(&memcg->objcg_list, &parent->objcg_list);
- spin_unlock_irq(&css_set_lock);
+ spin_unlock_irq(&objcg_lock);
percpu_ref_kill(&objcg->refcnt);
}
_
Patches currently in -mm which might be from guro(a)fb.com are
mm-memcg-synchronize-objcg-lists-with-a-dedicated-spinlock.patch
The patch titled
Subject: mm: vmscan: remove deadlock due to throttling failing to make progress
has been added to the -mm tree. Its filename is
mm-vmscan-remove-deadlock-due-to-throttling-failing-to-make-progress.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/mm-vmscan-remove-deadlock-due-to-…
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/mm-vmscan-remove-deadlock-due-to-…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Mel Gorman <mgorman(a)suse.de>
Subject: mm: vmscan: remove deadlock due to throttling failing to make progress
A soft lockup bug in kcompactd was reported in a private bugzilla with
the following visible in dmesg;
[15980.045209][ C33] watchdog: BUG: soft lockup - CPU#33 stuck for 26s! [kcompactd0:479]
[16008.044989][ C33] watchdog: BUG: soft lockup - CPU#33 stuck for 52s! [kcompactd0:479]
[16036.044768][ C33] watchdog: BUG: soft lockup - CPU#33 stuck for 78s! [kcompactd0:479]
[16064.044548][ C33] watchdog: BUG: soft lockup - CPU#33 stuck for 104s! [kcompactd0:479]
The machine had 256G of RAM with no swap and an earlier failed allocation
indicated that node 0 where kcompactd was run was potentially
unreclaimable;
Node 0 active_anon:29355112kB inactive_anon:2913528kB active_file:0kB
inactive_file:0kB unevictable:64kB isolated(anon):0kB isolated(file):0kB
mapped:8kB dirty:0kB writeback:0kB shmem:26780kB shmem_thp:
0kB shmem_pmdmapped: 0kB anon_thp: 23480320kB writeback_tmp:0kB
kernel_stack:2272kB pagetables:24500kB all_unreclaimable? yes
Vlastimil Babka investigated a crash dump and found that a task migrating pages
was trying to drain PCP lists;
PID: 52922 TASK: ffff969f820e5000 CPU: 19 COMMAND: "kworker/u128:3"
#0 [ffffaf4e4f4c3848] __schedule at ffffffffb840116d
#1 [ffffaf4e4f4c3908] schedule at ffffffffb8401e81
#2 [ffffaf4e4f4c3918] schedule_timeout at ffffffffb84066e8
#3 [ffffaf4e4f4c3990] wait_for_completion at ffffffffb8403072
#4 [ffffaf4e4f4c39d0] __flush_work at ffffffffb7ac3e4d
#5 [ffffaf4e4f4c3a48] __drain_all_pages at ffffffffb7cb707c
#6 [ffffaf4e4f4c3a80] __alloc_pages_slowpath.constprop.114 at ffffffffb7cbd9dd
#7 [ffffaf4e4f4c3b60] __alloc_pages at ffffffffb7cbe4f5
#8 [ffffaf4e4f4c3bc0] alloc_migration_target at ffffffffb7cf329c
#9 [ffffaf4e4f4c3bf0] migrate_pages at ffffffffb7cf6d15
10 [ffffaf4e4f4c3cb0] migrate_to_node at ffffffffb7cdb5aa
11 [ffffaf4e4f4c3da8] do_migrate_pages at ffffffffb7cdcf26
12 [ffffaf4e4f4c3e88] cpuset_migrate_mm_workfn at ffffffffb7b859d2
13 [ffffaf4e4f4c3e98] process_one_work at ffffffffb7ac45f3
14 [ffffaf4e4f4c3ed8] worker_thread at ffffffffb7ac47fd
15 [ffffaf4e4f4c3f10] kthread at ffffffffb7acbdc6
16 [ffffaf4e4f4c3f50] ret_from_fork at ffffffffb7a047e2
This failure is specific to CONFIG_PREEMPT=n builds. The root of the
problem is that kcompact0 is not rescheduling on a CPU while a task that
has isolated a large number of the pages from the LRU is waiting on
kcompact0 to reschedule so the pages can be released. While
shrink_inactive_list() only loops once around too_many_isolated, reclaim
can continue without rescheduling if sc->skipped_deactivate == 1 which
could happen if there was no file LRU and the inactive anon list was not
low.
Link: https://lkml.kernel.org/r/20220203100326.GD3301@suse.de
Fixes: d818fca1cac3 ("mm/vmscan: throttle reclaim and compaction when too may pages are isolated")
Signed-off-by: Mel Gorman <mgorman(a)suse.de>
Debugged-by: Vlastimil Babka <vbabka(a)suse.cz>
Reviewed-by: Vlastimil Babka <vbabka(a)suse.cz>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Rik van Riel <riel(a)surriel.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/vmscan.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
--- a/mm/vmscan.c~mm-vmscan-remove-deadlock-due-to-throttling-failing-to-make-progress
+++ a/mm/vmscan.c
@@ -1066,8 +1066,10 @@ void reclaim_throttle(pg_data_t *pgdat,
* forward progress (e.g. journalling workqueues or kthreads).
*/
if (!current_is_kswapd() &&
- current->flags & (PF_IO_WORKER|PF_KTHREAD))
+ current->flags & (PF_IO_WORKER|PF_KTHREAD)) {
+ cond_resched();
return;
+ }
/*
* These figures are pulled out of thin air.
_
Patches currently in -mm which might be from mgorman(a)suse.de are
mm-vmscan-remove-deadlock-due-to-throttling-failing-to-make-progress.patch
The patch titled
Subject: fs-proc-task_mmuc-dont-read-mapcount-for-migration-entry-v4
has been added to the -mm tree. Its filename is
fs-proc-task_mmuc-dont-read-mapcount-for-migration-entry-v4.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/fs-proc-task_mmuc-dont-read-mapco…
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/fs-proc-task_mmuc-dont-read-mapco…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Yang Shi <shy828301(a)gmail.com>
Subject: fs-proc-task_mmuc-dont-read-mapcount-for-migration-entry-v4
v4: * s/Treated/Treat per David
* Collected acked-by tag from David
v3: * Fixed the fix tag, the one used by v2 was not accurate
* Added comment about the risk calling page_mapcount() per David
* Fix pagemap
Link: https://lkml.kernel.org/r/20220203182641.824731-1-shy828301@gmail.com
Fixes: e9b61f19858a ("thp: reintroduce split_huge_page()")
Signed-off-by: Yang Shi <shy828301(a)gmail.com>
Reported-by: syzbot+1f52b3a18d5633fa7f82(a)syzkaller.appspotmail.com
Acked-by: David Hildenbrand <david(a)redhat.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov(a)linux.intel.com>
Cc: Jann Horn <jannh(a)google.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/proc/task_mmu.c | 17 ++++++++++++-----
1 file changed, 12 insertions(+), 5 deletions(-)
--- a/fs/proc/task_mmu.c~fs-proc-task_mmuc-dont-read-mapcount-for-migration-entry-v4
+++ a/fs/proc/task_mmu.c
@@ -469,9 +469,12 @@ static void smaps_account(struct mem_siz
* If any subpage of the compound page mapped with PTE it would elevate
* page_count().
*
- * Treated regular migration entries as mapcount == 1 without reading
- * mapcount since calling page_mapcount() for migration entries is
- * racy against THP splitting.
+ * The page_mapcount() is called to get a snapshot of the mapcount.
+ * Without holding the page lock this snapshot can be slightly wrong as
+ * we cannot always read the mapcount atomically. It is not safe to
+ * call page_mapcount() even with PTL held if the page is not mapped,
+ * especially for migration entries. Treat regular migration entries
+ * as mapcount == 1.
*/
if ((page_count(page) == 1) || migration) {
smaps_page_accumulate(mss, page, size, size << PSS_SHIFT, dirty,
@@ -1393,6 +1396,7 @@ static pagemap_entry_t pte_to_pagemap_en
{
u64 frame = 0, flags = 0;
struct page *page = NULL;
+ bool migration = false;
if (pte_present(pte)) {
if (pm->show_pfn)
@@ -1414,13 +1418,14 @@ static pagemap_entry_t pte_to_pagemap_en
frame = swp_type(entry) |
(swp_offset(entry) << MAX_SWAPFILES_SHIFT);
flags |= PM_SWAP;
+ migration = is_migration_entry(entry);
if (is_pfn_swap_entry(entry))
page = pfn_swap_entry_to_page(entry);
}
if (page && !PageAnon(page))
flags |= PM_FILE;
- if (page && page_mapcount(page) == 1)
+ if (page && !migration && page_mapcount(page) == 1)
flags |= PM_MMAP_EXCLUSIVE;
if (vma->vm_flags & VM_SOFTDIRTY)
flags |= PM_SOFT_DIRTY;
@@ -1436,6 +1441,7 @@ static int pagemap_pmd_range(pmd_t *pmdp
spinlock_t *ptl;
pte_t *pte, *orig_pte;
int err = 0;
+ bool migration = false;
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
ptl = pmd_trans_huge_lock(pmdp, vma);
@@ -1476,11 +1482,12 @@ static int pagemap_pmd_range(pmd_t *pmdp
if (pmd_swp_uffd_wp(pmd))
flags |= PM_UFFD_WP;
VM_BUG_ON(!is_pmd_migration_entry(pmd));
+ migration = is_migration_entry(entry);
page = pfn_swap_entry_to_page(entry);
}
#endif
- if (page && page_mapcount(page) == 1)
+ if (page && !migration && page_mapcount(page) == 1)
flags |= PM_MMAP_EXCLUSIVE;
for (; addr != end; addr += PAGE_SIZE) {
_
Patches currently in -mm which might be from shy828301(a)gmail.com are
fs-proc-task_mmuc-dont-read-mapcount-for-migration-entry.patch
fs-proc-task_mmuc-dont-read-mapcount-for-migration-entry-v4.patch
Note: this applies to 5.10 stable only. It doesn't trigger on anything
above 5.10 as the code there has been substantially reworked. This also
doesn't apply to any stable kernel below 5.10 afaict.
Syzbot found a bug: KASAN: invalid-free in io_dismantle_req
https://syzkaller.appspot.com/bug?id=123d9a852fc88ba573ffcb2dbcf4f9576c3b05…
The test submits bunch of io_uring writes and exits, which then triggers
uring_task_cancel() and io_put_identity(), which in some corner cases,
tries to free a static identity. This causes a panic as shown in the
trace below:
BUG: KASAN: double-free or invalid-free in kfree+0xd5/0x310
CPU: 0 PID: 4618 Comm: repro Not tainted 5.10.76-05281-g4944ec82ebb9-dirty #17
Call Trace:
dump_stack_lvl+0x1b2/0x21b
print_address_description+0x8d/0x3b0
kasan_report_invalid_free+0x58/0x130
____kasan_slab_free+0x14b/0x170
__kasan_slab_free+0x11/0x20
slab_free_freelist_hook+0xcc/0x1a0
kfree+0xd5/0x310
io_dismantle_req+0x9b0/0xd90
io_do_iopoll+0x13a4/0x23e0
io_iopoll_try_reap_events+0x116/0x290
io_uring_cancel_task_requests+0x197d/0x1ee0
io_uring_flush+0x170/0x6d0
filp_close+0xb0/0x150
put_files_struct+0x1d4/0x350
exit_files+0x80/0xa0
do_exit+0x6d9/0x2390
do_group_exit+0x16a/0x2d0
get_signal+0x133e/0x1f80
arch_do_signal+0x7b/0x610
exit_to_user_mode_prepare+0xaa/0xe0
syscall_exit_to_user_mode+0x24/0x40
do_syscall_64+0x3d/0x70
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Allocated by task 4611:
____kasan_kmalloc+0xcd/0x100
__kasan_kmalloc+0x9/0x10
kmem_cache_alloc_trace+0x208/0x390
io_uring_alloc_task_context+0x57/0x550
io_uring_add_task_file+0x1f7/0x290
io_uring_create+0x2195/0x3490
__x64_sys_io_uring_setup+0x1bf/0x280
do_syscall_64+0x31/0x70
entry_SYSCALL_64_after_hwframe+0x44/0xa9
The buggy address belongs to the object at ffff88810732b500
which belongs to the cache kmalloc-192 of size 192
The buggy address is located 88 bytes inside of
192-byte region [ffff88810732b500, ffff88810732b5c0)
Kernel panic - not syncing: panic_on_warn set ...
This issue bisected to this commit:
commit 186725a80c4e ("io_uring: fix skipping disabling sqo on exec")
Simple reverting the offending commit doesn't work as it hits some
other, related issues like:
/* sqo_dead check is for when this happens after cancellation */
WARN_ON_ONCE(ctx->sqo_task == current && !ctx->sqo_dead &&
!xa_load(&tctx->xa, (unsigned long)file));
------------[ cut here ]------------
WARNING: CPU: 1 PID: 5622 at fs/io_uring.c:8960 io_uring_flush+0x5bc/0x6d0
Modules linked in:
CPU: 1 PID: 5622 Comm: repro Not tainted 5.10.76-05281-g4944ec82ebb9-dirty #16
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-6.fc35 04/01/2014
RIP: 0010:io_uring_flush+0x5bc/0x6d0
Call Trace:
filp_close+0xb0/0x150
put_files_struct+0x1d4/0x350
reset_files_struct+0x88/0xa0
bprm_execve+0x7f2/0x9f0
do_execveat_common+0x46f/0x5d0
__x64_sys_execve+0x92/0xb0
do_syscall_64+0x31/0x70
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Changing __io_uring_task_cancel() to call io_disable_sqo_submit() directly,
as the comment suggests, only if __io_uring_files_cancel() is not executed
seems to fix the issue.
Cc: Jens Axboe <axboe(a)kernel.dk>
Cc: Alexander Viro <viro(a)zeniv.linux.org.uk>
Cc: <io-uring(a)vger.kernel.org>
Cc: <linux-fsdevel(a)vger.kernel.org>
Cc: <linux-kernel(a)vger.kernel.org>
Cc: <stable(a)vger.kernel.org>
Reported-by: syzbot+6055980d041c8ac23307(a)syzkaller.appspotmail.com
Signed-off-by: Tadeusz Struk <tadeusz.struk(a)linaro.org>
---
fs/io_uring.c | 21 +++++++++++++++++----
1 file changed, 17 insertions(+), 4 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 0736487165da..fcf9ffe9b209 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -8882,20 +8882,18 @@ void __io_uring_task_cancel(void)
struct io_uring_task *tctx = current->io_uring;
DEFINE_WAIT(wait);
s64 inflight;
+ int canceled = 0;
/* make sure overflow events are dropped */
atomic_inc(&tctx->in_idle);
- /* trigger io_disable_sqo_submit() */
- if (tctx->sqpoll)
- __io_uring_files_cancel(NULL);
-
do {
/* read completions before cancelations */
inflight = tctx_inflight(tctx);
if (!inflight)
break;
__io_uring_files_cancel(NULL);
+ canceled = 1;
prepare_to_wait(&tctx->wait, &wait, TASK_UNINTERRUPTIBLE);
@@ -8909,6 +8907,21 @@ void __io_uring_task_cancel(void)
finish_wait(&tctx->wait, &wait);
} while (1);
+ /*
+ * trigger io_disable_sqo_submit()
+ * if not already done by __io_uring_files_cancel()
+ */
+ if (tctx->sqpoll && !canceled) {
+ struct file *file;
+ unsigned long index;
+
+ xa_for_each(&tctx->xa, index, file) {
+ struct io_ring_ctx *ctx = file->private_data;
+
+ io_disable_sqo_submit(ctx);
+ }
+ }
+
atomic_dec(&tctx->in_idle);
io_uring_remove_task_files(tctx);
--
2.33.1
Commit 20b0dfa86bef0e80b41b0e5ac38b92f23b6f27f9 upstream.
The original commit depended on a rework commit (724fc856c09e ("drm/vc4:
hdmi: Split the CEC disable / enable functions in two")) that
(rightfully) didn't reach stable.
However, probably because the context changed, when the patch was
applied to stable the pm_runtime_put called got moved to the end of the
vc4_hdmi_cec_adap_enable function (that would have become
vc4_hdmi_cec_disable with the rework) to vc4_hdmi_cec_init.
This means that at probe time, we now drop our reference to the clocks
and power domains and thus end up with a CPU hang when the CPU tries to
access registers.
The call to pm_runtime_resume_and_get() is also problematic since the
.adap_enable CEC hook is called both to enable and to disable the
controller. That means that we'll now call pm_runtime_resume_and_get()
at disable time as well, messing with the reference counting.
The behaviour we should have though would be to have
pm_runtime_resume_and_get() called when the CEC controller is enabled,
and pm_runtime_put when it's disabled.
We need to move things around a bit to behave that way, but it aligns
stable with upstream.
Cc: <stable(a)vger.kernel.org> # 5.10.x
Cc: <stable(a)vger.kernel.org> # 5.15.x
Cc: <stable(a)vger.kernel.org> # 5.16.x
Reported-by: Michael Stapelberg <michael+drm(a)stapelberg.ch>
Signed-off-by: Maxime Ripard <maxime(a)cerno.tech>
---
drivers/gpu/drm/vc4/vc4_hdmi.c | 27 ++++++++++++++-------------
1 file changed, 14 insertions(+), 13 deletions(-)
diff --git a/drivers/gpu/drm/vc4/vc4_hdmi.c b/drivers/gpu/drm/vc4/vc4_hdmi.c
index 8465914892fa..e6aad838065b 100644
--- a/drivers/gpu/drm/vc4/vc4_hdmi.c
+++ b/drivers/gpu/drm/vc4/vc4_hdmi.c
@@ -1739,18 +1739,18 @@ static int vc4_hdmi_cec_adap_enable(struct cec_adapter *adap, bool enable)
u32 val;
int ret;
- ret = pm_runtime_resume_and_get(&vc4_hdmi->pdev->dev);
- if (ret)
- return ret;
-
- val = HDMI_READ(HDMI_CEC_CNTRL_5);
- val &= ~(VC4_HDMI_CEC_TX_SW_RESET | VC4_HDMI_CEC_RX_SW_RESET |
- VC4_HDMI_CEC_CNT_TO_4700_US_MASK |
- VC4_HDMI_CEC_CNT_TO_4500_US_MASK);
- val |= ((4700 / usecs) << VC4_HDMI_CEC_CNT_TO_4700_US_SHIFT) |
- ((4500 / usecs) << VC4_HDMI_CEC_CNT_TO_4500_US_SHIFT);
-
if (enable) {
+ ret = pm_runtime_resume_and_get(&vc4_hdmi->pdev->dev);
+ if (ret)
+ return ret;
+
+ val = HDMI_READ(HDMI_CEC_CNTRL_5);
+ val &= ~(VC4_HDMI_CEC_TX_SW_RESET | VC4_HDMI_CEC_RX_SW_RESET |
+ VC4_HDMI_CEC_CNT_TO_4700_US_MASK |
+ VC4_HDMI_CEC_CNT_TO_4500_US_MASK);
+ val |= ((4700 / usecs) << VC4_HDMI_CEC_CNT_TO_4700_US_SHIFT) |
+ ((4500 / usecs) << VC4_HDMI_CEC_CNT_TO_4500_US_SHIFT);
+
HDMI_WRITE(HDMI_CEC_CNTRL_5, val |
VC4_HDMI_CEC_TX_SW_RESET | VC4_HDMI_CEC_RX_SW_RESET);
HDMI_WRITE(HDMI_CEC_CNTRL_5, val);
@@ -1778,7 +1778,10 @@ static int vc4_hdmi_cec_adap_enable(struct cec_adapter *adap, bool enable)
HDMI_WRITE(HDMI_CEC_CPU_MASK_SET, VC4_HDMI_CPU_CEC);
HDMI_WRITE(HDMI_CEC_CNTRL_5, val |
VC4_HDMI_CEC_TX_SW_RESET | VC4_HDMI_CEC_RX_SW_RESET);
+
+ pm_runtime_put(&vc4_hdmi->pdev->dev);
}
+
return 0;
}
@@ -1889,8 +1892,6 @@ static int vc4_hdmi_cec_init(struct vc4_hdmi *vc4_hdmi)
if (ret < 0)
goto err_remove_handlers;
- pm_runtime_put(&vc4_hdmi->pdev->dev);
-
return 0;
err_remove_handlers:
--
2.34.1
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From e464121f2d40eabc7d11823fb26db807ce945df4 Mon Sep 17 00:00:00 2001
From: Tony Luck <tony.luck(a)intel.com>
Date: Fri, 21 Jan 2022 09:47:38 -0800
Subject: [PATCH] x86/cpu: Add Xeon Icelake-D to list of CPUs that support PPIN
Missed adding the Icelake-D CPU to the list. It uses the same MSRs
to control and read the inventory number as all the other models.
Fixes: dc6b025de95b ("x86/mce: Add Xeon Icelake to list of CPUs that support PPIN")
Reported-by: Ailin Xu <ailin.xu(a)intel.com>
Signed-off-by: Tony Luck <tony.luck(a)intel.com>
Signed-off-by: Borislav Petkov <bp(a)suse.de>
Cc: <stable(a)vger.kernel.org>
Link: https://lore.kernel.org/r/20220121174743.1875294-2-tony.luck@intel.com
diff --git a/arch/x86/kernel/cpu/mce/intel.c b/arch/x86/kernel/cpu/mce/intel.c
index bb9a46a804bf..baafbb37be67 100644
--- a/arch/x86/kernel/cpu/mce/intel.c
+++ b/arch/x86/kernel/cpu/mce/intel.c
@@ -486,6 +486,7 @@ static void intel_ppin_init(struct cpuinfo_x86 *c)
case INTEL_FAM6_BROADWELL_X:
case INTEL_FAM6_SKYLAKE_X:
case INTEL_FAM6_ICELAKE_X:
+ case INTEL_FAM6_ICELAKE_D:
case INTEL_FAM6_SAPPHIRERAPIDS_X:
case INTEL_FAM6_XEON_PHI_KNL:
case INTEL_FAM6_XEON_PHI_KNM:
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From c5de60cd622a2607c043ba65e25a6e9998a369f9 Mon Sep 17 00:00:00 2001
From: Namhyung Kim <namhyung(a)kernel.org>
Date: Mon, 24 Jan 2022 11:58:08 -0800
Subject: [PATCH] perf/core: Fix cgroup event list management
The active cgroup events are managed in the per-cpu cgrp_cpuctx_list.
This list is only accessed from current cpu and not protected by any
locks. But from the commit ef54c1a476ae ("perf: Rework
perf_event_exit_event()"), it's possible to access (actually modify)
the list from another cpu.
In the perf_remove_from_context(), it can remove an event from the
context without an IPI when the context is not active. This is not
safe with cgroup events which can have some active events in the
context even if ctx->is_active is 0 at the moment. The target cpu
might be in the middle of list iteration at the same time.
If the event is enabled when it's about to be closed, it might call
perf_cgroup_event_disable() and list_del() with the cgrp_cpuctx_list
on a different cpu.
This resulted in a crash due to an invalid list pointer access during
the cgroup list traversal on the cpu which the event belongs to.
Let's fallback to IPI to access the cgrp_cpuctx_list from that cpu.
Similarly, perf_install_in_context() should use IPI for the cgroup
events too.
Fixes: ef54c1a476ae ("perf: Rework perf_event_exit_event()")
Signed-off-by: Namhyung Kim <namhyung(a)kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Link: https://lkml.kernel.org/r/20220124195808.2252071-1-namhyung@kernel.org
diff --git a/kernel/events/core.c b/kernel/events/core.c
index b1c1928c0e7c..76c754e45d01 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -2462,7 +2462,11 @@ static void perf_remove_from_context(struct perf_event *event, unsigned long fla
* event_function_call() user.
*/
raw_spin_lock_irq(&ctx->lock);
- if (!ctx->is_active) {
+ /*
+ * Cgroup events are per-cpu events, and must IPI because of
+ * cgrp_cpuctx_list.
+ */
+ if (!ctx->is_active && !is_cgroup_event(event)) {
__perf_remove_from_context(event, __get_cpu_context(ctx),
ctx, (void *)flags);
raw_spin_unlock_irq(&ctx->lock);
@@ -2895,11 +2899,14 @@ perf_install_in_context(struct perf_event_context *ctx,
* perf_event_attr::disabled events will not run and can be initialized
* without IPI. Except when this is the first event for the context, in
* that case we need the magic of the IPI to set ctx->is_active.
+ * Similarly, cgroup events for the context also needs the IPI to
+ * manipulate the cgrp_cpuctx_list.
*
* The IOC_ENABLE that is sure to follow the creation of a disabled
* event will issue the IPI and reprogram the hardware.
*/
- if (__perf_effective_state(event) == PERF_EVENT_STATE_OFF && ctx->nr_events) {
+ if (__perf_effective_state(event) == PERF_EVENT_STATE_OFF &&
+ ctx->nr_events && !is_cgroup_event(event)) {
raw_spin_lock_irq(&ctx->lock);
if (ctx->task == TASK_TOMBSTONE) {
raw_spin_unlock_irq(&ctx->lock);
First patch removes old rovers, they are inherently racy
and cause packet drops/delays.
Second patch avoids iterating entire range of ports: It takes way
too long when most tuples are in use.
First patch is slightly mangled; nf_nat_proto_common.c needs minor adjustment
in context.
Florian Westphal (2):
netfilter: nat: remove l4 protocol port rovers
netfilter: nat: limit port clash resolution attempts
include/net/netfilter/nf_nat_l4proto.h | 2 +-
net/netfilter/nf_nat_proto_common.c | 36 ++++++++++++++++++--------
net/netfilter/nf_nat_proto_dccp.c | 5 +---
net/netfilter/nf_nat_proto_sctp.c | 5 +---
net/netfilter/nf_nat_proto_tcp.c | 5 +---
net/netfilter/nf_nat_proto_udp.c | 10 ++-----
6 files changed, 31 insertions(+), 32 deletions(-)
--
2.34.1
The intention of commit 886a9c134755 ("PCI: dwc: Move link handling into
common code") was to standardize the behavior of link down as explained
in its commit log:
"The behavior for a link down was inconsistent as some drivers would fail
probe in that case while others succeed. Let's standardize this to
succeed as there are usecases where devices (and the link) appear later
even without hotplug. For example, a reconfigured FPGA device."
The pci-imx6 still fails to probe when the link is not present, which
causes the following warning:
imx6q-pcie 8ffc000.pcie: Phy link never came up
imx6q-pcie: probe of 8ffc000.pcie failed with error -110
------------[ cut here ]------------
WARNING: CPU: 0 PID: 30 at drivers/regulator/core.c:2257 _regulator_put.part.0+0x1b8/0x1dc
Modules linked in:
CPU: 0 PID: 30 Comm: kworker/u2:2 Not tainted 5.15.0-next-20211103 #1
Hardware name: Freescale i.MX6 SoloX (Device Tree)
Workqueue: events_unbound async_run_entry_fn
[<c0111730>] (unwind_backtrace) from [<c010bb74>] (show_stack+0x10/0x14)
[<c010bb74>] (show_stack) from [<c0f90290>] (dump_stack_lvl+0x58/0x70)
[<c0f90290>] (dump_stack_lvl) from [<c012631c>] (__warn+0xd4/0x154)
[<c012631c>] (__warn) from [<c0f87b00>] (warn_slowpath_fmt+0x74/0xa8)
[<c0f87b00>] (warn_slowpath_fmt) from [<c076b4bc>] (_regulator_put.part.0+0x1b8/0x1dc)
[<c076b4bc>] (_regulator_put.part.0) from [<c076b574>] (regulator_put+0x2c/0x3c)
[<c076b574>] (regulator_put) from [<c08c3740>] (release_nodes+0x50/0x178)
Fix this problem by ignoring the dw_pcie_wait_for_link() error like
it is done on the other dwc drivers.
Tested on imx6sx-sdb and imx6q-sabresd boards.
Cc: <stable(a)vger.kernel.org>
Fixes: 886a9c134755 ("PCI: dwc: Move link handling into common code")
Signed-off-by: Fabio Estevam <festevam(a)gmail.com>
---
Changes since v1:
- Remove the printk timestamp from the kernel warning log (Richard).
drivers/pci/controller/dwc/pci-imx6.c | 10 ++--------
1 file changed, 2 insertions(+), 8 deletions(-)
diff --git a/drivers/pci/controller/dwc/pci-imx6.c b/drivers/pci/controller/dwc/pci-imx6.c
index 2ac081510632..5e8a03061b31 100644
--- a/drivers/pci/controller/dwc/pci-imx6.c
+++ b/drivers/pci/controller/dwc/pci-imx6.c
@@ -807,9 +807,7 @@ static int imx6_pcie_start_link(struct dw_pcie *pci)
/* Start LTSSM. */
imx6_pcie_ltssm_enable(dev);
- ret = dw_pcie_wait_for_link(pci);
- if (ret)
- goto err_reset_phy;
+ dw_pcie_wait_for_link(pci);
if (pci->link_gen == 2) {
/* Allow Gen2 mode after the link is up. */
@@ -845,11 +843,7 @@ static int imx6_pcie_start_link(struct dw_pcie *pci)
}
/* Make sure link training is finished as well! */
- ret = dw_pcie_wait_for_link(pci);
- if (ret) {
- dev_err(dev, "Failed to bring link up!\n");
- goto err_reset_phy;
- }
+ dw_pcie_wait_for_link(pci);
} else {
dev_info(dev, "Link: Gen2 disabled\n");
}
--
2.25.1
Commit effa453168a7 ("i2c: i801: Don't silently correct invalid transfer
size") revealed that ee1004_eeprom_read() did not properly limit how
many bytes to read at once.
In particular, i2c_smbus_read_i2c_block_data_or_emulated() takes the
length to read as an u8. If count == 256 after taking into account the
offset and page boundary, the cast to u8 overflows. And this is common
when user space tries to read the entire EEPROM at once.
To fix it, limit each read to I2C_SMBUS_BLOCK_MAX (32) bytes, already
the maximum length i2c_smbus_read_i2c_block_data_or_emulated() allows.
Fixes: effa453168a7 ("i2c: i801: Don't silently correct invalid transfer size")
Cc: stable(a)vger.kernel.org
Signed-off-by: Jonas Malaco <jonas(a)protocubo.io>
---
drivers/misc/eeprom/ee1004.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/misc/eeprom/ee1004.c b/drivers/misc/eeprom/ee1004.c
index bb9c4512c968..9fbfe784d710 100644
--- a/drivers/misc/eeprom/ee1004.c
+++ b/drivers/misc/eeprom/ee1004.c
@@ -114,6 +114,9 @@ static ssize_t ee1004_eeprom_read(struct i2c_client *client, char *buf,
if (offset + count > EE1004_PAGE_SIZE)
count = EE1004_PAGE_SIZE - offset;
+ if (count > I2C_SMBUS_BLOCK_MAX)
+ count = I2C_SMBUS_BLOCK_MAX;
+
return i2c_smbus_read_i2c_block_data_or_emulated(client, offset, count, buf);
}
base-commit: 88808fbbead481aedb46640a5ace69c58287f56a
--
2.35.1
With newer versions of GCC, there is a panic in da850_evm_config_emac()
when booting multi_v5_defconfig in QEMU under the palmetto-bmc machine:
Unable to handle kernel NULL pointer dereference at virtual address 00000020
pgd = (ptrval)
[00000020] *pgd=00000000
Internal error: Oops: 5 [#1] PREEMPT ARM
Modules linked in:
CPU: 0 PID: 1 Comm: swapper Not tainted 5.15.0 #1
Hardware name: Generic DT based system
PC is at da850_evm_config_emac+0x1c/0x120
LR is at do_one_initcall+0x50/0x1e0
The emac_pdata pointer in soc_info is NULL because davinci_soc_info only
gets populated on davinci machines but da850_evm_config_emac() is called
on all machines via device_initcall().
Move the rmii_en assignment below the machine check so that it is only
dereferenced when running on a supported SoC.
Cc: stable(a)vger.kernel.org
Fixes: bae105879f2f ("davinci: DA850/OMAP-L138 EVM: implement autodetect of RMII PHY")
Link: https://lore.kernel.org/r/YcS4xVWs6bQlQSPC@archlinux-ax161/
Reviewed-by: Arnd Bergmann <arnd(a)arndb.de>
Signed-off-by: Nathan Chancellor <nathan(a)kernel.org>
---
arch/arm/mach-davinci/board-da850-evm.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/arch/arm/mach-davinci/board-da850-evm.c b/arch/arm/mach-davinci/board-da850-evm.c
index 428012687a80..7f7f6bae21c2 100644
--- a/arch/arm/mach-davinci/board-da850-evm.c
+++ b/arch/arm/mach-davinci/board-da850-evm.c
@@ -1101,11 +1101,13 @@ static int __init da850_evm_config_emac(void)
int ret;
u32 val;
struct davinci_soc_info *soc_info = &davinci_soc_info;
- u8 rmii_en = soc_info->emac_pdata->rmii_en;
+ u8 rmii_en;
if (!machine_is_davinci_da850_evm())
return 0;
+ rmii_en = soc_info->emac_pdata->rmii_en;
+
cfg_chip3_base = DA8XX_SYSCFG0_VIRT(DA8XX_CFGCHIP3_REG);
val = __raw_readl(cfg_chip3_base);
base-commit: a7904a538933c525096ca2ccde1e60d0ee62c08e
--
2.34.1
This reverts commit d5f13bbb51046537b2c2b9868177fb8fe8a6a6e9.
This change related to fw_devlink was backported to v5.10 but has
severaly other dependencies that were not backported. As discussed
with the original author, the best approach for v5.10 is to revert.
Link: https://lore.kernel.org/linux-omap/7hk0efmfzo.fsf@baylibre.com
Cc: Saravana Kannan <saravanak(a)google.com>
Signed-off-by: Kevin Hilman <khilman(a)baylibre.com>
---
drivers/bus/simple-pm-bus.c | 39 +------------------------------------
1 file changed, 1 insertion(+), 38 deletions(-)
diff --git a/drivers/bus/simple-pm-bus.c b/drivers/bus/simple-pm-bus.c
index 244b8f3b38b4..c5eb46cbf388 100644
--- a/drivers/bus/simple-pm-bus.c
+++ b/drivers/bus/simple-pm-bus.c
@@ -16,33 +16,7 @@
static int simple_pm_bus_probe(struct platform_device *pdev)
{
- const struct device *dev = &pdev->dev;
- struct device_node *np = dev->of_node;
- const struct of_device_id *match;
-
- /*
- * Allow user to use driver_override to bind this driver to a
- * transparent bus device which has a different compatible string
- * that's not listed in simple_pm_bus_of_match. We don't want to do any
- * of the simple-pm-bus tasks for these devices, so return early.
- */
- if (pdev->driver_override)
- return 0;
-
- match = of_match_device(dev->driver->of_match_table, dev);
- /*
- * These are transparent bus devices (not simple-pm-bus matches) that
- * have their child nodes populated automatically. So, don't need to
- * do anything more. We only match with the device if this driver is
- * the most specific match because we don't want to incorrectly bind to
- * a device that has a more specific driver.
- */
- if (match && match->data) {
- if (of_property_match_string(np, "compatible", match->compatible) == 0)
- return 0;
- else
- return -ENODEV;
- }
+ struct device_node *np = pdev->dev.of_node;
dev_dbg(&pdev->dev, "%s\n", __func__);
@@ -56,25 +30,14 @@ static int simple_pm_bus_probe(struct platform_device *pdev)
static int simple_pm_bus_remove(struct platform_device *pdev)
{
- const void *data = of_device_get_match_data(&pdev->dev);
-
- if (pdev->driver_override || data)
- return 0;
-
dev_dbg(&pdev->dev, "%s\n", __func__);
pm_runtime_disable(&pdev->dev);
return 0;
}
-#define ONLY_BUS ((void *) 1) /* Match if the device is only a bus. */
-
static const struct of_device_id simple_pm_bus_of_match[] = {
{ .compatible = "simple-pm-bus", },
- { .compatible = "simple-bus", .data = ONLY_BUS },
- { .compatible = "simple-mfd", .data = ONLY_BUS },
- { .compatible = "isa", .data = ONLY_BUS },
- { .compatible = "arm,amba-bus", .data = ONLY_BUS },
{ /* sentinel */ }
};
MODULE_DEVICE_TABLE(of, simple_pm_bus_of_match);
--
2.34.0
Dear Greg and Sasha,
can you, please, cherry-pick following commit from Linus' tree:
commit 899663be5e75dc0174dc8bda0b5e6826edf0b29a with subject line
"Bluetooth: refactor malicious adv data check".
There is missing Fixes: tag inside commit log, which causes this patch
to not go to stable trees, but it actually fixes false-positive messages
in dmesg.
I've seen 2 reports and I've asked reporters to test this patch and they
have reported, that this patch actually fixes the problem [1] (second
report was private, so I can't refer it there)
[1]
https://lore.kernel.org/linux-bluetooth/CAC2ZOYuXg4btk9PaE3whmP7JnntsixEvDu…
With regards,
Pavel Skripkin
From: Daniel Borkmann <daniel(a)iogearbox.net>
commit 050fad7c4534c13c8eb1d9c2ba66012e014773cb upstream.
Recently during testing, I ran into the following panic:
[ 207.892422] Internal error: Accessing user space memory outside uaccess.h routines: 96000004 [#1] SMP
[ 207.901637] Modules linked in: binfmt_misc [...]
[ 207.966530] CPU: 45 PID: 2256 Comm: test_verifier Tainted: G W 4.17.0-rc3+ #7
[ 207.974956] Hardware name: FOXCONN R2-1221R-A4/C2U4N_MB, BIOS G31FB18A 03/31/2017
[ 207.982428] pstate: 60400005 (nZCv daif +PAN -UAO)
[ 207.987214] pc : bpf_skb_load_helper_8_no_cache+0x34/0xc0
[ 207.992603] lr : 0xffff000000bdb754
[ 207.996080] sp : ffff000013703ca0
[ 207.999384] x29: ffff000013703ca0 x28: 0000000000000001
[ 208.004688] x27: 0000000000000001 x26: 0000000000000000
[ 208.009992] x25: ffff000013703ce0 x24: ffff800fb4afcb00
[ 208.015295] x23: ffff00007d2f5038 x22: ffff00007d2f5000
[ 208.020599] x21: fffffffffeff2a6f x20: 000000000000000a
[ 208.025903] x19: ffff000009578000 x18: 0000000000000a03
[ 208.031206] x17: 0000000000000000 x16: 0000000000000000
[ 208.036510] x15: 0000ffff9de83000 x14: 0000000000000000
[ 208.041813] x13: 0000000000000000 x12: 0000000000000000
[ 208.047116] x11: 0000000000000001 x10: ffff0000089e7f18
[ 208.052419] x9 : fffffffffeff2a6f x8 : 0000000000000000
[ 208.057723] x7 : 000000000000000a x6 : 00280c6160000000
[ 208.063026] x5 : 0000000000000018 x4 : 0000000000007db6
[ 208.068329] x3 : 000000000008647a x2 : 19868179b1484500
[ 208.073632] x1 : 0000000000000000 x0 : ffff000009578c08
[ 208.078938] Process test_verifier (pid: 2256, stack limit = 0x0000000049ca7974)
[ 208.086235] Call trace:
[ 208.088672] bpf_skb_load_helper_8_no_cache+0x34/0xc0
[ 208.093713] 0xffff000000bdb754
[ 208.096845] bpf_test_run+0x78/0xf8
[ 208.100324] bpf_prog_test_run_skb+0x148/0x230
[ 208.104758] sys_bpf+0x314/0x1198
[ 208.108064] el0_svc_naked+0x30/0x34
[ 208.111632] Code: 91302260 f9400001 f9001fa1 d2800001 (29500680)
[ 208.117717] ---[ end trace 263cb8a59b5bf29f ]---
The program itself which caused this had a long jump over the whole
instruction sequence where all of the inner instructions required
heavy expansions into multiple BPF instructions. Additionally, I also
had BPF hardening enabled which requires once more rewrites of all
constant values in order to blind them. Each time we rewrite insns,
bpf_adj_branches() would need to potentially adjust branch targets
which cross the patchlet boundary to accommodate for the additional
delta. Eventually that lead to the case where the target offset could
not fit into insn->off's upper 0x7fff limit anymore where then offset
wraps around becoming negative (in s16 universe), or vice versa
depending on the jump direction.
Therefore it becomes necessary to detect and reject any such occasions
in a generic way for native eBPF and cBPF to eBPF migrations. For
the latter we can simply check bounds in the bpf_convert_filter()'s
BPF_EMIT_JMP helper macro and bail out once we surpass limits. The
bpf_patch_insn_single() for native eBPF (and cBPF to eBPF in case
of subsequent hardening) is a bit more complex in that we need to
detect such truncations before hitting the bpf_prog_realloc(). Thus
the latter is split into an extra pass to probe problematic offsets
on the original program in order to fail early. With that in place
and carefully tested I no longer hit the panic and the rewrites are
rejected properly. The above example panic I've seen on bpf-next,
though the issue itself is generic in that a guard against this issue
in bpf seems more appropriate in this case.
Cc: Thadeu Lima de Souza Cascardo <cascardo(a)canonical.com>
Signed-off-by: Daniel Borkmann <daniel(a)iogearbox.net>
Acked-by: Martin KaFai Lau <kafai(a)fb.com>
Signed-off-by: Alexei Starovoitov <ast(a)kernel.org>
[ab: Dropped BPF_PSEUDO_CALL hardening, introoduced in 4.16]
Signed-off-by: Alessio Balsini <balsini(a)android.com>
---
kernel/bpf/core.c | 59 ++++++++++++++++++++++++++++++++++++++++-------
net/core/filter.c | 11 +++++++--
2 files changed, 60 insertions(+), 10 deletions(-)
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 485e319ba742..220c43a085e4 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -228,27 +228,57 @@ static bool bpf_is_jmp_and_has_target(const struct bpf_insn *insn)
BPF_OP(insn->code) != BPF_EXIT;
}
-static void bpf_adj_branches(struct bpf_prog *prog, u32 pos, u32 delta)
+static int bpf_adj_delta_to_off(struct bpf_insn *insn, u32 pos, u32 delta,
+ u32 curr, const bool probe_pass)
{
+ const s32 off_min = S16_MIN, off_max = S16_MAX;
+ s32 off = insn->off;
+
+ if (curr < pos && curr + off + 1 > pos)
+ off += delta;
+ else if (curr > pos + delta && curr + off + 1 <= pos + delta)
+ off -= delta;
+ if (off < off_min || off > off_max)
+ return -ERANGE;
+ if (!probe_pass)
+ insn->off = off;
+ return 0;
+}
+
+static int bpf_adj_branches(struct bpf_prog *prog, u32 pos, u32 delta,
+ const bool probe_pass)
+{
+ u32 i, insn_cnt = prog->len + (probe_pass ? delta : 0);
struct bpf_insn *insn = prog->insnsi;
- u32 i, insn_cnt = prog->len;
+ int ret = 0;
for (i = 0; i < insn_cnt; i++, insn++) {
+ /* In the probing pass we still operate on the original,
+ * unpatched image in order to check overflows before we
+ * do any other adjustments. Therefore skip the patchlet.
+ */
+ if (probe_pass && i == pos) {
+ i += delta + 1;
+ insn++;
+ }
+
if (!bpf_is_jmp_and_has_target(insn))
continue;
- /* Adjust offset of jmps if we cross boundaries. */
- if (i < pos && i + insn->off + 1 > pos)
- insn->off += delta;
- else if (i > pos + delta && i + insn->off + 1 <= pos + delta)
- insn->off -= delta;
+ /* Adjust offset of jmps if we cross patch boundaries. */
+ ret = bpf_adj_delta_to_off(insn, pos, delta, i, probe_pass);
+ if (ret)
+ break;
}
+
+ return ret;
}
struct bpf_prog *bpf_patch_insn_single(struct bpf_prog *prog, u32 off,
const struct bpf_insn *patch, u32 len)
{
u32 insn_adj_cnt, insn_rest, insn_delta = len - 1;
+ const u32 cnt_max = S16_MAX;
struct bpf_prog *prog_adj;
/* Since our patchlet doesn't expand the image, we're done. */
@@ -259,6 +289,15 @@ struct bpf_prog *bpf_patch_insn_single(struct bpf_prog *prog, u32 off,
insn_adj_cnt = prog->len + insn_delta;
+ /* Reject anything that would potentially let the insn->off
+ * target overflow when we have excessive program expansions.
+ * We need to probe here before we do any reallocation where
+ * we afterwards may not fail anymore.
+ */
+ if (insn_adj_cnt > cnt_max &&
+ bpf_adj_branches(prog, off, insn_delta, true))
+ return NULL;
+
/* Several new instructions need to be inserted. Make room
* for them. Likely, there's no need for a new allocation as
* last page could have large enough tailroom.
@@ -284,7 +323,11 @@ struct bpf_prog *bpf_patch_insn_single(struct bpf_prog *prog, u32 off,
sizeof(*patch) * insn_rest);
memcpy(prog_adj->insnsi + off, patch, sizeof(*patch) * len);
- bpf_adj_branches(prog_adj, off, insn_delta);
+ /* We are guaranteed to not fail at this point, otherwise
+ * the ship has sailed to reverse to the original state. An
+ * overflow cannot happen at this point.
+ */
+ BUG_ON(bpf_adj_branches(prog_adj, off, insn_delta, false));
return prog_adj;
}
diff --git a/net/core/filter.c b/net/core/filter.c
index 729e302bba6e..9b934767a1d8 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -472,11 +472,18 @@ static int bpf_convert_filter(struct sock_filter *prog, int len,
#define BPF_EMIT_JMP \
do { \
+ const s32 off_min = S16_MIN, off_max = S16_MAX; \
+ s32 off; \
+ \
if (target >= len || target < 0) \
goto err; \
- insn->off = addrs ? addrs[target] - addrs[i] - 1 : 0; \
+ off = addrs ? addrs[target] - addrs[i] - 1 : 0; \
/* Adjust pc relative offset for 2nd or 3rd insn. */ \
- insn->off -= insn - tmp_insns; \
+ off -= insn - tmp_insns; \
+ /* Reject anything not fitting into insn->off. */ \
+ if (off < off_min || off > off_max) \
+ goto err; \
+ insn->off = off; \
} while (0)
case BPF_JMP | BPF_JA:
--
2.35.0.rc2.247.g8bbb082509-goog
Attached are two patches that take the place of one IPA fix that was
requested for back-port. The original did not apply cleanly, so one
prerequisite was back-ported, followed by the "real" fix. They are
built upon Linux v5.10.95.
I wasn't sure how I was supposed to indicate this was a manual
back-port. I added the upstream commit ID in each patch but it
needs to be edited. And I added a "References" tag indicating the
e-mail to which this small series responds.
Please let me know if there's something else/more I should do.
Thanks.
-Alex
Alex Elder (2):
net: ipa: use a bitmap for endpoint replenish_enabled
net: ipa: prevent concurrent replenish
drivers/net/ipa/ipa_endpoint.c | 20 ++++++++++++++++----
drivers/net/ipa/ipa_endpoint.h | 15 ++++++++++++++-
2 files changed, 30 insertions(+), 5 deletions(-)
--
2.32.0
The patch below does not apply to the 5.16-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 998c0bd2b3715244da7639cc4e6a2062cb79c3f4 Mon Sep 17 00:00:00 2001
From: Alex Elder <elder(a)linaro.org>
Date: Wed, 12 Jan 2022 07:30:12 -0600
Subject: [PATCH] net: ipa: prevent concurrent replenish
We have seen cases where an endpoint RX completion interrupt arrives
while replenishing for the endpoint is underway. This causes another
instance of replenishing to begin as part of completing the receive
transaction. If this occurs it can lead to transaction corruption.
Use a new flag to ensure only one replenish instance for an endpoint
executes at a time.
Fixes: 84f9bd12d46db ("soc: qcom: ipa: IPA endpoints")
Signed-off-by: Alex Elder <elder(a)linaro.org>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
diff --git a/drivers/net/ipa/ipa_endpoint.c b/drivers/net/ipa/ipa_endpoint.c
index cddddcedaf72..68291a3efd04 100644
--- a/drivers/net/ipa/ipa_endpoint.c
+++ b/drivers/net/ipa/ipa_endpoint.c
@@ -1088,15 +1088,27 @@ static void ipa_endpoint_replenish(struct ipa_endpoint *endpoint, bool add_one)
return;
}
+ /* If already active, just update the backlog */
+ if (test_and_set_bit(IPA_REPLENISH_ACTIVE, endpoint->replenish_flags)) {
+ if (add_one)
+ atomic_inc(&endpoint->replenish_backlog);
+ return;
+ }
+
while (atomic_dec_not_zero(&endpoint->replenish_backlog))
if (ipa_endpoint_replenish_one(endpoint))
goto try_again_later;
+
+ clear_bit(IPA_REPLENISH_ACTIVE, endpoint->replenish_flags);
+
if (add_one)
atomic_inc(&endpoint->replenish_backlog);
return;
try_again_later:
+ clear_bit(IPA_REPLENISH_ACTIVE, endpoint->replenish_flags);
+
/* The last one didn't succeed, so fix the backlog */
delta = add_one ? 2 : 1;
backlog = atomic_add_return(delta, &endpoint->replenish_backlog);
@@ -1691,6 +1703,7 @@ static void ipa_endpoint_setup_one(struct ipa_endpoint *endpoint)
* backlog is the same as the maximum outstanding TREs.
*/
clear_bit(IPA_REPLENISH_ENABLED, endpoint->replenish_flags);
+ clear_bit(IPA_REPLENISH_ACTIVE, endpoint->replenish_flags);
atomic_set(&endpoint->replenish_saved,
gsi_channel_tre_max(gsi, endpoint->channel_id));
atomic_set(&endpoint->replenish_backlog, 0);
diff --git a/drivers/net/ipa/ipa_endpoint.h b/drivers/net/ipa/ipa_endpoint.h
index 07d5c20e5f00..0313cdc607de 100644
--- a/drivers/net/ipa/ipa_endpoint.h
+++ b/drivers/net/ipa/ipa_endpoint.h
@@ -44,10 +44,12 @@ enum ipa_endpoint_name {
* enum ipa_replenish_flag: RX buffer replenish flags
*
* @IPA_REPLENISH_ENABLED: Whether receive buffer replenishing is enabled
+ * @IPA_REPLENISH_ACTIVE: Whether replenishing is underway
* @IPA_REPLENISH_COUNT: Number of defined replenish flags
*/
enum ipa_replenish_flag {
IPA_REPLENISH_ENABLED,
+ IPA_REPLENISH_ACTIVE,
IPA_REPLENISH_COUNT, /* Number of flags (must be last) */
};
XXX commit 6c0e3b5ce94947b311348c367db9e11dcb2ccc93 upstream.
In ipa_endpoint_replenish(), if an error occurs when attempting to
replenish a receive buffer, we just quit and try again later. In
that case we increment the backlog count to reflect that the attempt
was unsuccessful. Then, if the add_one flag was true we increment
the backlog again.
This second increment is not included in the backlog local variable
though, and its value determines whether delayed work should be
scheduled. This is a bug.
Fix this by determining whether 1 or 2 should be added to the
backlog before adding it in a atomic_add_return() call.
Reference: https://lore.kernel.org/stable/164303143324018@kroah.com
Reviewed-by: Matthias Kaehlcke <mka(a)chromium.org>
Fixes: 84f9bd12d46db ("soc: qcom: ipa: IPA endpoints")
Signed-off-by: Alex Elder <elder(a)linaro.org>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
---
drivers/net/ipa/ipa_endpoint.c | 5 +----
1 file changed, 1 insertion(+), 4 deletions(-)
diff --git a/drivers/net/ipa/ipa_endpoint.c b/drivers/net/ipa/ipa_endpoint.c
index a37aae00e128f..397323f9e5d64 100644
--- a/drivers/net/ipa/ipa_endpoint.c
+++ b/drivers/net/ipa/ipa_endpoint.c
@@ -918,10 +918,7 @@ static void ipa_endpoint_replenish(struct ipa_endpoint *endpoint, u32 count)
try_again_later:
/* The last one didn't succeed, so fix the backlog */
- backlog = atomic_inc_return(&endpoint->replenish_backlog);
-
- if (count)
- atomic_add(count, &endpoint->replenish_backlog);
+ backlog = atomic_add_return(count + 1, &endpoint->replenish_backlog);
/* Whenever a receive buffer transaction completes we'll try to
* replenish again. It's unlikely, but if we fail to supply even
--
2.32.0
Syzbot found a GPF in reweight_entity. This has been bisected to commit
4ef0c5c6b5ba ("kernel/sched: Fix sched_fork() access an invalid sched_task_group")
There is a race between sched_post_fork() and setpriority(PRIO_PGRP)
within a thread group that causes a null-ptr-deref in reweight_entity()
in CFS. The scenario is that the main process spawns number of new
threads, which then call setpriority(PRIO_PGRP, 0, -20), wait, and exit.
For each of the new threads the copy_process() gets invoked, which adds
the new task_struct and calls sched_post_fork() for it.
In the above scenario there is a possibility that setpriority(PRIO_PGRP)
and set_one_prio() will be called for a thread in the group that is just
being created by copy_process(), and for which the sched_post_fork() has
not been executed yet. This will trigger a null pointer dereference in
reweight_entity(), as it will try to access the run queue pointer, which
hasn't been set. This results it a crash as shown below:
KASAN: null-ptr-deref in range [0x00000000000000a0-0x00000000000000a7]
CPU: 0 PID: 2392 Comm: reduced_repro Not tainted 5.16.0-11201-gb42c5a161ea3
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1.fc35 04/01/2014
RIP: 0010:reweight_entity+0x15d/0x440
RSP: 0018:ffffc900035dfcf8 EFLAGS: 00010006
Call Trace:
<TASK>
reweight_task+0xde/0x1c0
set_load_weight+0x21c/0x2b0
set_user_nice.part.0+0x2d1/0x519
set_user_nice.cold+0x8/0xd
set_one_prio+0x24f/0x263
__do_sys_setpriority+0x2d3/0x640
__x64_sys_setpriority+0x84/0x8b
do_syscall_64+0x35/0xb0
entry_SYSCALL_64_after_hwframe+0x44/0xae
</TASK>
---[ end trace 9dc80a9d378ed00a ]---
Before the mentioned change the cfs_rq pointer for the task has been
set in sched_fork(), which is called much earlier in copy_process(),
before the new task is added to the thread_group.
Now it is done in the sched_post_fork(), which is called after that.
To fix the issue the update_load condition passed to set_load_weight()
in set_user_nice() and __sched_setscheduler() has been changed from
always true to true if the task->state != TASK_NEW.
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Juri Lelli <juri.lelli(a)redhat.com>
Cc: Vincent Guittot <vincent.guittot(a)linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann(a)arm.com>
Cc: Steven Rostedt <rostedt(a)goodmis.org>
Cc: Ben Segall <bsegall(a)google.com>
Cc: Mel Gorman <mgorman(a)suse.de>
Cc: Daniel Bristot de Oliveira <bristot(a)redhat.com>
Cc: Zhang Qiao <zhangqiao22(a)huawei.com>
Cc: Daniel Jordan <daniel.m.jordan(a)oracle.com>
Cc: stable(a)vger.kernel.org
Cc: linux-kernel(a)vger.kernel.org
Link: https://syzkaller.appspot.com/bug?id=9d9c27adc674e3a7932b22b61c79a02da82cbd…
Fixes: 4ef0c5c6b5ba ("kernel/sched: Fix sched_fork() access an invalid sched_task_group")
Reported-by: syzbot+af7a719bc92395ee41b3(a)syzkaller.appspotmail.com
Signed-off-by: Tadeusz Struk <tadeusz.struk(a)linaro.org>
---
Changes in v3:
- Removed the new check and changed the update_load condition from
always true to true if p->state != TASK_NEW
Changes in v2:
- Added a check in set_user_nice(), and return from there if the task
is not fully setup instead of returning from reweight_entity()
---
---
kernel/sched/core.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 848eaa0efe0e..3d7ede06b971 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6921,7 +6921,7 @@ void set_user_nice(struct task_struct *p, long nice)
put_prev_task(rq, p);
p->static_prio = NICE_TO_PRIO(nice);
- set_load_weight(p, true);
+ set_load_weight(p, !(READ_ONCE(p->__state) & TASK_NEW));
old_prio = p->prio;
p->prio = effective_prio(p);
@@ -7212,7 +7212,7 @@ static void __setscheduler_params(struct task_struct *p,
*/
p->rt_priority = attr->sched_priority;
p->normal_prio = normal_prio(p);
- set_load_weight(p, true);
+ set_load_weight(p, !(READ_ONCE(p->__state) & TASK_NEW));
}
/*
--
2.34.1
Sasha, Greg,
Could you queue following two commits for 4.19.y tree?
6ed5943f8735e2b778d92ea4d9805c0a1d89bc2b
netfilter: nat: remove l4 protocol port rovers
a504b703bb1da526a01593da0e4be2af9d9f5fa8
netfilter: nat: limit port clash resolution attempts
This resolves softlockup when most of the ephemeral ports
are in use.
Its also needed on older kernels but unfortunately they
won't apply as-is. We will try to get modified backports
for older releases and forward them to stable@ later.
Thanks,
Florian