The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 47a1db8e797da01a1309bf42e0c0d771d4e4d4f3 Mon Sep 17 00:00:00 2001
From: Johan Hovold <johan(a)kernel.org>
Date: Wed, 1 Dec 2021 14:25:26 +0100
Subject: [PATCH] firmware: qemu_fw_cfg: fix kobject leak in probe error path
An initialised kobject must be freed using kobject_put() to avoid
leaking associated resources (e.g. the object name).
Commit fe3c60684377 ("firmware: Fix a reference count leak.") "fixed"
the leak in the first error path of the file registration helper but
left the second one unchanged. This "fix" would however result in a NULL
pointer dereference due to the release function also removing the never
added entry from the fw_cfg_entry_cache list. This has now been
addressed.
Fix the remaining kobject leak by restoring the common error path and
adding the missing kobject_put().
Fixes: 75f3e8e47f38 ("firmware: introduce sysfs driver for QEMU's fw_cfg device")
Cc: stable(a)vger.kernel.org # 4.6
Cc: Gabriel Somlo <somlo(a)cmu.edu>
Signed-off-by: Johan Hovold <johan(a)kernel.org>
Link: https://lore.kernel.org/r/20211201132528.30025-3-johan@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/firmware/qemu_fw_cfg.c b/drivers/firmware/qemu_fw_cfg.c
index a9c64ebfc49a..ccb7ed62452f 100644
--- a/drivers/firmware/qemu_fw_cfg.c
+++ b/drivers/firmware/qemu_fw_cfg.c
@@ -603,15 +603,13 @@ static int fw_cfg_register_file(const struct fw_cfg_file *f)
/* register entry under "/sys/firmware/qemu_fw_cfg/by_key/" */
err = kobject_init_and_add(&entry->kobj, &fw_cfg_sysfs_entry_ktype,
fw_cfg_sel_ko, "%d", entry->select);
- if (err) {
- kobject_put(&entry->kobj);
- return err;
- }
+ if (err)
+ goto err_put_entry;
/* add raw binary content access */
err = sysfs_create_bin_file(&entry->kobj, &fw_cfg_sysfs_attr_raw);
if (err)
- goto err_add_raw;
+ goto err_del_entry;
/* try adding "/sys/firmware/qemu_fw_cfg/by_name/" symlink */
fw_cfg_build_symlink(fw_cfg_fname_kset, &entry->kobj, entry->name);
@@ -620,9 +618,10 @@ static int fw_cfg_register_file(const struct fw_cfg_file *f)
fw_cfg_sysfs_cache_enlist(entry);
return 0;
-err_add_raw:
+err_del_entry:
kobject_del(&entry->kobj);
- kfree(entry);
+err_put_entry:
+ kobject_put(&entry->kobj);
return err;
}
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 23584c1ed3e15a6f4bfab8dc5a88d94ab929ee12 Mon Sep 17 00:00:00 2001
From: Lukas Wunner <lukas(a)wunner.de>
Date: Wed, 17 Nov 2021 23:22:09 +0100
Subject: [PATCH] PCI: pciehp: Fix infinite loop in IRQ handler upon power
fault
The Power Fault Detected bit in the Slot Status register differs from
all other hotplug events in that it is sticky: It can only be cleared
after turning off slot power. Per PCIe r5.0, sec. 6.7.1.8:
If a power controller detects a main power fault on the hot-plug slot,
it must automatically set its internal main power fault latch [...].
The main power fault latch is cleared when software turns off power to
the hot-plug slot.
The stickiness used to cause interrupt storms and infinite loops which
were fixed in 2009 by commits 5651c48cfafe ("PCI pciehp: fix power fault
interrupt storm problem") and 99f0169c17f3 ("PCI: pciehp: enable
software notification on empty slots").
Unfortunately in 2020 the infinite loop issue was inadvertently
reintroduced by commit 8edf5332c393 ("PCI: pciehp: Fix MSI interrupt
race"): The hardirq handler pciehp_isr() clears the PFD bit until
pciehp's power_fault_detected flag is set. That happens in the IRQ
thread pciehp_ist(), which never learns of the event because the hardirq
handler is stuck in an infinite loop. Fix by setting the
power_fault_detected flag already in the hardirq handler.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=214989
Link: https://lore.kernel.org/linux-pci/DM8PR11MB5702255A6A92F735D90A4446868B9@DM…
Fixes: 8edf5332c393 ("PCI: pciehp: Fix MSI interrupt race")
Link: https://lore.kernel.org/r/66eaeef31d4997ceea357ad93259f290ededecfd.16371872…
Reported-by: Joseph Bao <joseph.bao(a)intel.com>
Tested-by: Joseph Bao <joseph.bao(a)intel.com>
Signed-off-by: Lukas Wunner <lukas(a)wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas(a)google.com>
Cc: stable(a)vger.kernel.org # v4.19+
Cc: Stuart Hayes <stuart.w.hayes(a)gmail.com>
diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
index 83a0fa119cae..9535c61cbff3 100644
--- a/drivers/pci/hotplug/pciehp_hpc.c
+++ b/drivers/pci/hotplug/pciehp_hpc.c
@@ -642,6 +642,8 @@ static irqreturn_t pciehp_isr(int irq, void *dev_id)
*/
if (ctrl->power_fault_detected)
status &= ~PCI_EXP_SLTSTA_PFD;
+ else if (status & PCI_EXP_SLTSTA_PFD)
+ ctrl->power_fault_detected = true;
events |= status;
if (!events) {
@@ -651,7 +653,7 @@ static irqreturn_t pciehp_isr(int irq, void *dev_id)
}
if (status) {
- pcie_capability_write_word(pdev, PCI_EXP_SLTSTA, events);
+ pcie_capability_write_word(pdev, PCI_EXP_SLTSTA, status);
/*
* In MSI mode, all event bits must be zero before the port
@@ -725,8 +727,7 @@ static irqreturn_t pciehp_ist(int irq, void *dev_id)
}
/* Check Power Fault Detected */
- if ((events & PCI_EXP_SLTSTA_PFD) && !ctrl->power_fault_detected) {
- ctrl->power_fault_detected = 1;
+ if (events & PCI_EXP_SLTSTA_PFD) {
ctrl_err(ctrl, "Slot(%s): Power fault\n", slot_name(ctrl));
pciehp_set_indicators(ctrl, PCI_EXP_SLTCTL_PWR_IND_OFF,
PCI_EXP_SLTCTL_ATTN_IND_ON);
Commit b3612ccdf284 ("net: dsa: microchip: implement multi-bridge support")
plugged a packet leak between ports that were members of different bridges.
Unfortunately, this broke another use case, namely that of more than two
ports that are members of the same bridge.
After that commit, when a port is added to a bridge, hardware bridging
between other member ports of that bridge will be cleared, preventing
packet exchange between them.
Fix by ensuring that the Port VLAN Membership bitmap includes any existing
ports in the bridge, not just the port being added.
Upstream commit 3d00827a90db6f79abc7cdc553887f89a2e0a184, backported to 5.16.
Fixes: b3612ccdf284 ("net: dsa: microchip: implement multi-bridge support")
Signed-off-by: Svenning Sørensen <sss(a)secomea.com>
Tested-by: Oleksij Rempel <o.rempel(a)pengutronix.de>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
---
drivers/net/dsa/microchip/ksz_common.c | 26 +++++++++++++++++++++++---
1 file changed, 23 insertions(+), 3 deletions(-)
diff --git a/drivers/net/dsa/microchip/ksz_common.c b/drivers/net/dsa/microchip/ksz_common.c
index 8a04302018dc..7ab9ab58de65 100644
--- a/drivers/net/dsa/microchip/ksz_common.c
+++ b/drivers/net/dsa/microchip/ksz_common.c
@@ -26,7 +26,7 @@ void ksz_update_port_member(struct ksz_device *dev, int port)
struct dsa_switch *ds = dev->ds;
u8 port_member = 0, cpu_port;
const struct dsa_port *dp;
- int i;
+ int i, j;
if (!dsa_is_user_port(ds, port))
return;
@@ -45,13 +45,33 @@ void ksz_update_port_member(struct ksz_device *dev, int port)
continue;
if (!dp->bridge_dev || dp->bridge_dev != other_dp->bridge_dev)
continue;
+ if (other_p->stp_state != BR_STATE_FORWARDING)
+ continue;
- if (other_p->stp_state == BR_STATE_FORWARDING &&
- p->stp_state == BR_STATE_FORWARDING) {
+ if (p->stp_state == BR_STATE_FORWARDING) {
val |= BIT(port);
port_member |= BIT(i);
}
+ /* Retain port [i]'s relationship to other ports than [port] */
+ for (j = 0; j < ds->num_ports; j++) {
+ const struct dsa_port *third_dp;
+ struct ksz_port *third_p;
+
+ if (j == i)
+ continue;
+ if (j == port)
+ continue;
+ if (!dsa_is_user_port(ds, j))
+ continue;
+ third_p = &dev->ports[j];
+ if (third_p->stp_state != BR_STATE_FORWARDING)
+ continue;
+ third_dp = dsa_to_port(ds, j);
+ if (third_dp->bridge_dev == dp->bridge_dev)
+ val |= BIT(j);
+ }
+
dev->dev_ops->cfg_port_member(dev, i, val | cpu_port);
}
--
2.20.1
Hi Greg,
I'm sorry about the crippled patch in my last message.
Unfortunately, I have no other means than the MS webmail client to send mail,
and it seems it corrupts plain text when replying to a thread.
Therefore, I send this as a new message - I really hope it will work this time.
Best regards, Svenning
From: Brett Creeley <brett.creeley(a)intel.com>
commit e6ba5273d4ede03d075d7a116b8edad1f6115f4d upstream.
[I had to fix the cherry-pick manually as the patch added a line around
some context that was missing.]
The VF can be configured via the PF's ndo ops at the same time the PF is
receiving/handling virtchnl messages. This has many issues, with
one of them being the ndo op could be actively resetting a VF (i.e.
resetting it to the default state and deleting/re-adding the VF's VSI)
while a virtchnl message is being handled. The following error was seen
because a VF ndo op was used to change a VF's trust setting while the
VIRTCHNL_OP_CONFIG_VSI_QUEUES was ongoing:
[35274.192484] ice 0000:88:00.0: Failed to set LAN Tx queue context, error: ICE_ERR_PARAM
[35274.193074] ice 0000:88:00.0: VF 0 failed opcode 6, retval: -5
[35274.193640] iavf 0000:88:01.0: PF returned error -5 (IAVF_ERR_PARAM) to our request 6
Fix this by making sure the virtchnl handling and VF ndo ops that
trigger VF resets cannot run concurrently. This is done by adding a
struct mutex cfg_lock to each VF structure. For VF ndo ops, the mutex
will be locked around the critical operations and VFR. Since the ndo ops
will trigger a VFR, the virtchnl thread will use mutex_trylock(). This
is done because if any other thread (i.e. VF ndo op) has the mutex, then
that means the current VF message being handled is no longer valid, so
just ignore it.
This issue can be seen using the following commands:
for i in {0..50}; do
rmmod ice
modprobe ice
sleep 1
echo 1 > /sys/class/net/ens785f0/device/sriov_numvfs
echo 1 > /sys/class/net/ens785f1/device/sriov_numvfs
ip link set ens785f1 vf 0 trust on
ip link set ens785f0 vf 0 trust on
sleep 2
echo 0 > /sys/class/net/ens785f0/device/sriov_numvfs
echo 0 > /sys/class/net/ens785f1/device/sriov_numvfs
sleep 1
echo 1 > /sys/class/net/ens785f0/device/sriov_numvfs
echo 1 > /sys/class/net/ens785f1/device/sriov_numvfs
ip link set ens785f1 vf 0 trust on
ip link set ens785f0 vf 0 trust on
done
Fixes: 7c710869d64e ("ice: Add handlers for VF netdevice operations")
Cc: <stable(a)vger.kernel.org> # 5.8.x
Signed-off-by: Brett Creeley <brett.creeley(a)intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski(a)intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen(a)intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller(a)intel.com>
---
This is for stable trees 5.8 through 5.12. I sent patches for 5.13 and 5.14
separately since they have slightly different context
.../net/ethernet/intel/ice/ice_virtchnl_pf.c | 25 +++++++++++++++++++
.../net/ethernet/intel/ice/ice_virtchnl_pf.h | 5 ++++
2 files changed, 30 insertions(+)
diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c
index 48dee9c5d534..66da8f540454 100644
--- a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c
+++ b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c
@@ -375,6 +375,8 @@ void ice_free_vfs(struct ice_pf *pf)
set_bit(ICE_VF_STATE_DIS, pf->vf[i].vf_states);
ice_free_vf_res(&pf->vf[i]);
}
+
+ mutex_destroy(&pf->vf[i].cfg_lock);
}
if (ice_sriov_free_msix_res(pf))
@@ -1556,6 +1558,8 @@ static void ice_set_dflt_settings_vfs(struct ice_pf *pf)
set_bit(ICE_VIRTCHNL_VF_CAP_L2, &vf->vf_caps);
vf->spoofchk = true;
vf->num_vf_qs = pf->num_qps_per_vf;
+
+ mutex_init(&vf->cfg_lock);
}
}
@@ -3389,6 +3393,8 @@ ice_set_vf_port_vlan(struct net_device *netdev, int vf_id, u16 vlan_id, u8 qos,
return 0;
}
+ mutex_lock(&vf->cfg_lock);
+
vf->port_vlan_info = vlanprio;
if (vf->port_vlan_info)
@@ -3398,6 +3404,7 @@ ice_set_vf_port_vlan(struct net_device *netdev, int vf_id, u16 vlan_id, u8 qos,
dev_info(dev, "Clearing port VLAN on VF %d\n", vf_id);
ice_vc_reset_vf(vf);
+ mutex_unlock(&vf->cfg_lock);
return 0;
}
@@ -3763,6 +3770,15 @@ void ice_vc_process_vf_msg(struct ice_pf *pf, struct ice_rq_event_info *event)
return;
}
+ /* VF is being configured in another context that triggers a VFR, so no
+ * need to process this message
+ */
+ if (!mutex_trylock(&vf->cfg_lock)) {
+ dev_info(dev, "VF %u is being configured in another context that will trigger a VFR, so there is no need to handle this message\n",
+ vf->vf_id);
+ return;
+ }
+
switch (v_opcode) {
case VIRTCHNL_OP_VERSION:
err = ice_vc_get_ver_msg(vf, msg);
@@ -3839,6 +3855,8 @@ void ice_vc_process_vf_msg(struct ice_pf *pf, struct ice_rq_event_info *event)
dev_info(dev, "PF failed to honor VF %d, opcode %d, error %d\n",
vf_id, v_opcode, err);
}
+
+ mutex_unlock(&vf->cfg_lock);
}
/**
@@ -3953,6 +3971,8 @@ int ice_set_vf_mac(struct net_device *netdev, int vf_id, u8 *mac)
return -EINVAL;
}
+ mutex_lock(&vf->cfg_lock);
+
/* VF is notified of its new MAC via the PF's response to the
* VIRTCHNL_OP_GET_VF_RESOURCES message after the VF has been reset
*/
@@ -3970,6 +3990,7 @@ int ice_set_vf_mac(struct net_device *netdev, int vf_id, u8 *mac)
}
ice_vc_reset_vf(vf);
+ mutex_unlock(&vf->cfg_lock);
return 0;
}
@@ -3999,11 +4020,15 @@ int ice_set_vf_trust(struct net_device *netdev, int vf_id, bool trusted)
if (trusted == vf->trusted)
return 0;
+ mutex_lock(&vf->cfg_lock);
+
vf->trusted = trusted;
ice_vc_reset_vf(vf);
dev_info(ice_pf_to_dev(pf), "VF %u is now %strusted\n",
vf_id, trusted ? "" : "un");
+ mutex_unlock(&vf->cfg_lock);
+
return 0;
}
diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.h b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.h
index 0f519fba3770..59e5b4f16e96 100644
--- a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.h
+++ b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.h
@@ -68,6 +68,11 @@ struct ice_mdd_vf_events {
struct ice_vf {
struct ice_pf *pf;
+ /* Used during virtchnl message handling and NDO ops against the VF
+ * that will trigger a VFR
+ */
+ struct mutex cfg_lock;
+
u16 vf_id; /* VF ID in the PF space */
u16 lan_vsi_idx; /* index into PF struct */
/* first vector index of this VF in the PF space */
--
2.35.1.355.ge7e302376dd6
From: Brett Creeley <brett.creeley(a)intel.com>
commit e6ba5273d4ede03d075d7a116b8edad1f6115f4d upstream.
[I had to fix the cherry-pick manually as the patch added a line around
some context that was missing.]
The VF can be configured via the PF's ndo ops at the same time the PF is
receiving/handling virtchnl messages. This has many issues, with
one of them being the ndo op could be actively resetting a VF (i.e.
resetting it to the default state and deleting/re-adding the VF's VSI)
while a virtchnl message is being handled. The following error was seen
because a VF ndo op was used to change a VF's trust setting while the
VIRTCHNL_OP_CONFIG_VSI_QUEUES was ongoing:
[35274.192484] ice 0000:88:00.0: Failed to set LAN Tx queue context, error: ICE_ERR_PARAM
[35274.193074] ice 0000:88:00.0: VF 0 failed opcode 6, retval: -5
[35274.193640] iavf 0000:88:01.0: PF returned error -5 (IAVF_ERR_PARAM) to our request 6
Fix this by making sure the virtchnl handling and VF ndo ops that
trigger VF resets cannot run concurrently. This is done by adding a
struct mutex cfg_lock to each VF structure. For VF ndo ops, the mutex
will be locked around the critical operations and VFR. Since the ndo ops
will trigger a VFR, the virtchnl thread will use mutex_trylock(). This
is done because if any other thread (i.e. VF ndo op) has the mutex, then
that means the current VF message being handled is no longer valid, so
just ignore it.
This issue can be seen using the following commands:
for i in {0..50}; do
rmmod ice
modprobe ice
sleep 1
echo 1 > /sys/class/net/ens785f0/device/sriov_numvfs
echo 1 > /sys/class/net/ens785f1/device/sriov_numvfs
ip link set ens785f1 vf 0 trust on
ip link set ens785f0 vf 0 trust on
sleep 2
echo 0 > /sys/class/net/ens785f0/device/sriov_numvfs
echo 0 > /sys/class/net/ens785f1/device/sriov_numvfs
sleep 1
echo 1 > /sys/class/net/ens785f0/device/sriov_numvfs
echo 1 > /sys/class/net/ens785f1/device/sriov_numvfs
ip link set ens785f1 vf 0 trust on
ip link set ens785f0 vf 0 trust on
done
Fixes: 7c710869d64e ("ice: Add handlers for VF netdevice operations")
Cc: <stable(a)vger.kernel.org> # 5.13.x
Signed-off-by: Brett Creeley <brett.creeley(a)intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski(a)intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen(a)intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller(a)intel.com>
---
This should apply to 5.13
.../net/ethernet/intel/ice/ice_virtchnl_pf.c | 25 +++++++++++++++++++
.../net/ethernet/intel/ice/ice_virtchnl_pf.h | 5 ++++
2 files changed, 30 insertions(+)
diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c
index 671902d9fc35..2629d670bbbf 100644
--- a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c
+++ b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c
@@ -647,6 +647,8 @@ void ice_free_vfs(struct ice_pf *pf)
set_bit(ICE_VF_STATE_DIS, pf->vf[i].vf_states);
ice_free_vf_res(&pf->vf[i]);
}
+
+ mutex_destroy(&pf->vf[i].cfg_lock);
}
if (ice_sriov_free_msix_res(pf))
@@ -1893,6 +1895,8 @@ static void ice_set_dflt_settings_vfs(struct ice_pf *pf)
*/
ice_vf_ctrl_invalidate_vsi(vf);
ice_vf_fdir_init(vf);
+
+ mutex_init(&vf->cfg_lock);
}
}
@@ -3955,6 +3959,8 @@ ice_set_vf_port_vlan(struct net_device *netdev, int vf_id, u16 vlan_id, u8 qos,
return 0;
}
+ mutex_lock(&vf->cfg_lock);
+
vf->port_vlan_info = vlanprio;
if (vf->port_vlan_info)
@@ -3964,6 +3970,7 @@ ice_set_vf_port_vlan(struct net_device *netdev, int vf_id, u16 vlan_id, u8 qos,
dev_info(dev, "Clearing port VLAN on VF %d\n", vf_id);
ice_vc_reset_vf(vf);
+ mutex_unlock(&vf->cfg_lock);
return 0;
}
@@ -4338,6 +4345,15 @@ void ice_vc_process_vf_msg(struct ice_pf *pf, struct ice_rq_event_info *event)
return;
}
+ /* VF is being configured in another context that triggers a VFR, so no
+ * need to process this message
+ */
+ if (!mutex_trylock(&vf->cfg_lock)) {
+ dev_info(dev, "VF %u is being configured in another context that will trigger a VFR, so there is no need to handle this message\n",
+ vf->vf_id);
+ return;
+ }
+
switch (v_opcode) {
case VIRTCHNL_OP_VERSION:
err = ice_vc_get_ver_msg(vf, msg);
@@ -4426,6 +4442,8 @@ void ice_vc_process_vf_msg(struct ice_pf *pf, struct ice_rq_event_info *event)
dev_info(dev, "PF failed to honor VF %d, opcode %d, error %d\n",
vf_id, v_opcode, err);
}
+
+ mutex_unlock(&vf->cfg_lock);
}
/**
@@ -4540,6 +4558,8 @@ int ice_set_vf_mac(struct net_device *netdev, int vf_id, u8 *mac)
return -EINVAL;
}
+ mutex_lock(&vf->cfg_lock);
+
/* VF is notified of its new MAC via the PF's response to the
* VIRTCHNL_OP_GET_VF_RESOURCES message after the VF has been reset
*/
@@ -4557,6 +4577,7 @@ int ice_set_vf_mac(struct net_device *netdev, int vf_id, u8 *mac)
}
ice_vc_reset_vf(vf);
+ mutex_unlock(&vf->cfg_lock);
return 0;
}
@@ -4586,11 +4607,15 @@ int ice_set_vf_trust(struct net_device *netdev, int vf_id, bool trusted)
if (trusted == vf->trusted)
return 0;
+ mutex_lock(&vf->cfg_lock);
+
vf->trusted = trusted;
ice_vc_reset_vf(vf);
dev_info(ice_pf_to_dev(pf), "VF %u is now %strusted\n",
vf_id, trusted ? "" : "un");
+ mutex_unlock(&vf->cfg_lock);
+
return 0;
}
diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.h b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.h
index d800ed83d6c3..3da39d63a24b 100644
--- a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.h
+++ b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.h
@@ -69,6 +69,11 @@ struct ice_mdd_vf_events {
struct ice_vf {
struct ice_pf *pf;
+ /* Used during virtchnl message handling and NDO ops against the VF
+ * that will trigger a VFR
+ */
+ struct mutex cfg_lock;
+
u16 vf_id; /* VF ID in the PF space */
u16 lan_vsi_idx; /* index into PF struct */
u16 ctrl_vsi_idx;
--
2.35.1.355.ge7e302376dd6
commit 59348401ebed ("platform/x86: amd-pmc: Add special handling for
timer based S0i3 wakeup") adds support for using another platform timer
in lieu of the RTC which doesn't work properly on some systems. This path
was validated and worked well before submission. During the 5.16-rc1 merge
window other patches were merged that caused this to stop working properly.
When this feature was used with 5.16-rc1 or later some OEM laptops with the
matching firmware requirements from that commit would shutdown instead of
program a timer based wakeup.
This was bisected to commit 8d89835b0467 ("PM: suspend: Do not pause
cpuidle in the suspend-to-idle path"). This wasn't supposed to cause any
negative impacts and also tested well on both Intel and ARM platforms.
However this changed the semantics of when CPUs are allowed to be in the
deepest state. For the AMD systems in question it appears this causes a
firmware crash for timer based wakeup.
It's hypothesized to be caused by the `amd-pmc` driver sending `OS_HINT`
and all the CPUs going into a deep state while the timer is still being
programmed. It's likely a firmware bug, but to avoid it don't allow setting
CPUs into the deepest state while using CZN timer wakeup path.
If later it's discovered that this also occurs from "regular" suspends
without a timer as well or on other silicon, this may be later expanded to
run in the suspend path for more scenarios.
Cc: stable(a)vger.kernel.org # 5.16+
Suggested-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
Link: https://lore.kernel.org/linux-acpi/BL1PR12MB51570F5BD05980A0DCA1F3F4E23A9@B…
Fixes: 8d89835b0467 ("PM: suspend: Do not pause cpuidle in the suspend-to-idle path")
Fixes: 23f62d7ab25b ("PM: sleep: Pause cpuidle later and resume it earlier during system transitions")
Fixes: 59348401ebed ("platform/x86: amd-pmc: Add special handling for timer based S0i3 wakeup"
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
Signed-off-by: Mario Limonciello <mario.limonciello(a)amd.com>
Link: https://lore.kernel.org/r/20220223175237.6209-1-mario.limonciello@amd.com
Reviewed-by: Hans de Goede <hdegoede(a)redhat.com>
Signed-off-by: Hans de Goede <hdegoede(a)redhat.com>
(cherry picked from commit 68af28426b3ca1bf9ba21c7d8bdd0ff639e5134c)
---
This didn't apply cleanly to 5.16.y because 5.16.y doesn't contain the STB
feature. Manually fixed up the commit for this.
This is *only* intended for 5.16.
drivers/platform/x86/amd-pmc.c | 34 ++++++++++++++++++++++++++++++----
1 file changed, 30 insertions(+), 4 deletions(-)
diff --git a/drivers/platform/x86/amd-pmc.c b/drivers/platform/x86/amd-pmc.c
index 8c74733530e3..11d0f829302b 100644
--- a/drivers/platform/x86/amd-pmc.c
+++ b/drivers/platform/x86/amd-pmc.c
@@ -21,6 +21,7 @@
#include <linux/module.h>
#include <linux/pci.h>
#include <linux/platform_device.h>
+#include <linux/pm_qos.h>
#include <linux/rtc.h>
#include <linux/suspend.h>
#include <linux/seq_file.h>
@@ -79,6 +80,9 @@
#define PMC_MSG_DELAY_MIN_US 50
#define RESPONSE_REGISTER_LOOP_MAX 20000
+/* QoS request for letting CPUs in idle states, but not the deepest */
+#define AMD_PMC_MAX_IDLE_STATE_LATENCY 3
+
#define SOC_SUBSYSTEM_IP_MAX 12
#define DELAY_MIN_US 2000
#define DELAY_MAX_US 3000
@@ -123,6 +127,7 @@ struct amd_pmc_dev {
u8 rev;
struct device *dev;
struct mutex lock; /* generic mutex lock */
+ struct pm_qos_request amd_pmc_pm_qos_req;
#if IS_ENABLED(CONFIG_DEBUG_FS)
struct dentry *dbgfs_dir;
#endif /* CONFIG_DEBUG_FS */
@@ -459,6 +464,14 @@ static int amd_pmc_verify_czn_rtc(struct amd_pmc_dev *pdev, u32 *arg)
rc = rtc_alarm_irq_enable(rtc_device, 0);
dev_dbg(pdev->dev, "wakeup timer programmed for %lld seconds\n", duration);
+ /*
+ * Prevent CPUs from getting into deep idle states while sending OS_HINT
+ * which is otherwise generally safe to send when at least one of the CPUs
+ * is not in deep idle states.
+ */
+ cpu_latency_qos_update_request(&pdev->amd_pmc_pm_qos_req, AMD_PMC_MAX_IDLE_STATE_LATENCY);
+ wake_up_all_idle_cpus();
+
return rc;
}
@@ -476,17 +489,24 @@ static int __maybe_unused amd_pmc_suspend(struct device *dev)
/* Activate CZN specific RTC functionality */
if (pdev->cpu_id == AMD_CPU_ID_CZN) {
rc = amd_pmc_verify_czn_rtc(pdev, &arg);
- if (rc < 0)
- return rc;
+ if (rc)
+ goto fail;
}
/* Dump the IdleMask before we send hint to SMU */
amd_pmc_idlemask_read(pdev, dev, NULL);
msg = amd_pmc_get_os_hint(pdev);
rc = amd_pmc_send_cmd(pdev, arg, NULL, msg, 0);
- if (rc)
+ if (rc) {
dev_err(pdev->dev, "suspend failed\n");
+ goto fail;
+ }
+ return 0;
+fail:
+ if (pdev->cpu_id == AMD_CPU_ID_CZN)
+ cpu_latency_qos_update_request(&pdev->amd_pmc_pm_qos_req,
+ PM_QOS_DEFAULT_VALUE);
return rc;
}
@@ -507,7 +527,12 @@ static int __maybe_unused amd_pmc_resume(struct device *dev)
/* Dump the IdleMask to see the blockers */
amd_pmc_idlemask_read(pdev, dev, NULL);
- return 0;
+ /* Restore the QoS request back to defaults if it was set */
+ if (pdev->cpu_id == AMD_CPU_ID_CZN)
+ cpu_latency_qos_update_request(&pdev->amd_pmc_pm_qos_req,
+ PM_QOS_DEFAULT_VALUE);
+
+ return rc;
}
static const struct dev_pm_ops amd_pmc_pm_ops = {
@@ -597,6 +622,7 @@ static int amd_pmc_probe(struct platform_device *pdev)
amd_pmc_get_smu_version(dev);
platform_set_drvdata(pdev, dev);
amd_pmc_dbgfs_register(dev);
+ cpu_latency_qos_add_request(&dev->amd_pmc_pm_qos_req, PM_QOS_DEFAULT_VALUE);
return 0;
}
--
2.34.1
From: Frederic Weisbecker <frederic(a)kernel.org>
commit b2fcf2102049f6e56981e0ab3d9b633b8e2741da upstream.
This sequence of events can lead to a failure to requeue a CPU's
->nocb_timer:
1. There are no callbacks queued for any CPU covered by CPU 0-2's
->nocb_gp_kthread. Note that ->nocb_gp_kthread is associated
with CPU 0.
2. CPU 1 enqueues its first callback with interrupts disabled, and
thus must defer awakening its ->nocb_gp_kthread. It therefore
queues its rcu_data structure's ->nocb_timer. At this point,
CPU 1's rdp->nocb_defer_wakeup is RCU_NOCB_WAKE.
3. CPU 2, which shares the same ->nocb_gp_kthread, also enqueues a
callback, but with interrupts enabled, allowing it to directly
awaken the ->nocb_gp_kthread.
4. The newly awakened ->nocb_gp_kthread associates both CPU 1's
and CPU 2's callbacks with a future grace period and arranges
for that grace period to be started.
5. This ->nocb_gp_kthread goes to sleep waiting for the end of this
future grace period.
6. This grace period elapses before the CPU 1's timer fires.
This is normally improbably given that the timer is set for only
one jiffy, but timers can be delayed. Besides, it is possible
that kernel was built with CONFIG_RCU_STRICT_GRACE_PERIOD=y.
7. The grace period ends, so rcu_gp_kthread awakens the
->nocb_gp_kthread, which in turn awakens both CPU 1's and
CPU 2's ->nocb_cb_kthread. Then ->nocb_gb_kthread sleeps
waiting for more newly queued callbacks.
8. CPU 1's ->nocb_cb_kthread invokes its callback, then sleeps
waiting for more invocable callbacks.
9. Note that neither kthread updated any ->nocb_timer state,
so CPU 1's ->nocb_defer_wakeup is still set to RCU_NOCB_WAKE.
10. CPU 1 enqueues its second callback, this time with interrupts
enabled so it can wake directly ->nocb_gp_kthread.
It does so with calling wake_nocb_gp() which also cancels the
pending timer that got queued in step 2. But that doesn't reset
CPU 1's ->nocb_defer_wakeup which is still set to RCU_NOCB_WAKE.
So CPU 1's ->nocb_defer_wakeup and its ->nocb_timer are now
desynchronized.
11. ->nocb_gp_kthread associates the callback queued in 10 with a new
grace period, arranges for that grace period to start and sleeps
waiting for it to complete.
12. The grace period ends, rcu_gp_kthread awakens ->nocb_gp_kthread,
which in turn wakes up CPU 1's ->nocb_cb_kthread which then
invokes the callback queued in 10.
13. CPU 1 enqueues its third callback, this time with interrupts
disabled so it must queue a timer for a deferred wakeup. However
the value of its ->nocb_defer_wakeup is RCU_NOCB_WAKE which
incorrectly indicates that a timer is already queued. Instead,
CPU 1's ->nocb_timer was cancelled in 10. CPU 1 therefore fails
to queue the ->nocb_timer.
14. CPU 1 has its pending callback and it may go unnoticed until
some other CPU ever wakes up ->nocb_gp_kthread or CPU 1 ever
calls an explicit deferred wakeup, for example, during idle entry.
This commit fixes this bug by resetting rdp->nocb_defer_wakeup everytime
we delete the ->nocb_timer.
It is quite possible that there is a similar scenario involving
->nocb_bypass_timer and ->nocb_defer_wakeup. However, despite some
effort from several people, a failure scenario has not yet been located.
However, that by no means guarantees that no such scenario exists.
Finding a failure scenario is left as an exercise for the reader, and the
"Fixes:" tag below relates to ->nocb_bypass_timer instead of ->nocb_timer.
Fixes: d1b222c6be1f (rcu/nocb: Add bypass callback queueing)
Cc: <stable(a)vger.kernel.org>
Cc: Josh Triplett <josh(a)joshtriplett.org>
Cc: Lai Jiangshan <jiangshanlai(a)gmail.com>
Cc: Joel Fernandes <joel(a)joelfernandes.org>
Cc: Boqun Feng <boqun.feng(a)gmail.com>
Reviewed-by: Neeraj Upadhyay <neeraju(a)codeaurora.org>
Signed-off-by: Frederic Weisbecker <frederic(a)kernel.org>
Signed-off-by: Paul E. McKenney <paulmck(a)kernel.org>
Conflicts:
kernel/rcu/tree_plugin.h
Signed-off-by: Zhen Lei <thunder.leizhen(a)huawei.com>
---
kernel/rcu/tree_plugin.h | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 244f32e98360fdf..658427c33b9370e 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -1646,7 +1646,11 @@ static void wake_nocb_gp(struct rcu_data *rdp, bool force,
rcu_nocb_unlock_irqrestore(rdp, flags);
return;
}
- del_timer(&rdp->nocb_timer);
+
+ if (READ_ONCE(rdp->nocb_defer_wakeup) > RCU_NOCB_WAKE_NOT) {
+ WRITE_ONCE(rdp->nocb_defer_wakeup, RCU_NOCB_WAKE_NOT);
+ del_timer(&rdp->nocb_timer);
+ }
rcu_nocb_unlock_irqrestore(rdp, flags);
raw_spin_lock_irqsave(&rdp_gp->nocb_gp_lock, flags);
if (force || READ_ONCE(rdp_gp->nocb_gp_sleep)) {
@@ -2164,7 +2168,6 @@ static void do_nocb_deferred_wakeup_common(struct rcu_data *rdp)
return;
}
ndw = READ_ONCE(rdp->nocb_defer_wakeup);
- WRITE_ONCE(rdp->nocb_defer_wakeup, RCU_NOCB_WAKE_NOT);
wake_nocb_gp(rdp, ndw == RCU_NOCB_WAKE_FORCE, flags);
trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("DeferredWake"));
}
--
2.26.0.106.g9fadedd