Issuing lpfc_force_rscn twice results in an ndlp kref use-after-free call
trace.
A prior patch reworked the get/put handling by ensuring nlp_get was done
before WQE submission and a put was done in the completion path.
Unfortunately, the issue_els_rscn path had a piece of legacy code that
did a nlp_put, causing an imbalance on the ref counts.
Fixed by removing the unnecessary legacy code snippet.
Fixes: 4430f7fd09ec ("scsi: lpfc: Rework locations of ndlp reference taking")
Cc: <stable(a)vger.kernel.org> # v5.11+
Co-developed-by: Justin Tee <justin.tee(a)broadcom.com>
Signed-off-by: Justin Tee <justin.tee(a)broadcom.com>
Signed-off-by: James Smart <jsmart2021(a)gmail.com>
---
drivers/scsi/lpfc/lpfc_els.c | 5 -----
1 file changed, 5 deletions(-)
diff --git a/drivers/scsi/lpfc/lpfc_els.c b/drivers/scsi/lpfc/lpfc_els.c
index 5c10416c1c75..78024f11b794 100644
--- a/drivers/scsi/lpfc/lpfc_els.c
+++ b/drivers/scsi/lpfc/lpfc_els.c
@@ -3538,11 +3538,6 @@ lpfc_issue_els_rscn(struct lpfc_vport *vport, uint8_t retry)
return 1;
}
- /* This will cause the callback-function lpfc_cmpl_els_cmd to
- * trigger the release of node.
- */
- if (!(vport->fc_flag & FC_PT2PT))
- lpfc_nlp_put(ndlp);
return 0;
}
--
2.26.2
Use down_read_nested() and down_write_nested() when taking the
ctrl->reset_lock rw-sem, passing the number of PCIe hotplug controllers in
the path to the PCI root bus as lock subclass parameter. This fixes the
following false-positive lockdep report when unplugging a Lenovo X1C8 from
a Lenovo 2nd gen TB3 dock:
[ 28.583853] pcieport 0000:06:01.0: pciehp: Slot(1): Link Down
[ 28.583891] pcieport 0000:06:01.0: pciehp: Slot(1): Card not present
[ 28.584849] ============================================
[ 28.584854] WARNING: possible recursive locking detected
[ 28.584858] 5.16.0-rc2+ #621 Not tainted
[ 28.584864] --------------------------------------------
[ 28.584867] irq/124-pciehp/86 is trying to acquire lock:
[ 28.584873] ffff8e5ac4299ef8 (&ctrl->reset_lock){.+.+}-{3:3}, at: pciehp_check_presence+0x23/0x80
[ 28.584904]
but task is already holding lock:
[ 28.584908] ffff8e5ac4298af8 (&ctrl->reset_lock){.+.+}-{3:3}, at: pciehp_ist+0xf3/0x180
[ 28.584929]
other info that might help us debug this:
[ 28.584933] Possible unsafe locking scenario:
[ 28.584936] CPU0
[ 28.584939] ----
[ 28.584942] lock(&ctrl->reset_lock);
[ 28.584949] lock(&ctrl->reset_lock);
[ 28.584955]
*** DEADLOCK ***
[ 28.584959] May be due to missing lock nesting notation
[ 28.584963] 3 locks held by irq/124-pciehp/86:
[ 28.584970] #0: ffff8e5ac4298af8 (&ctrl->reset_lock){.+.+}-{3:3}, at: pciehp_ist+0xf3/0x180
[ 28.584991] #1: ffffffffa3b024e8 (pci_rescan_remove_lock){+.+.}-{3:3}, at: pciehp_unconfigure_device+0x31/0x110
[ 28.585012] #2: ffff8e5ac1ee2248 (&dev->mutex){....}-{3:3}, at: device_release_driver+0x1c/0x40
[ 28.585037]
stack backtrace:
[ 28.585042] CPU: 4 PID: 86 Comm: irq/124-pciehp Not tainted 5.16.0-rc2+ #621
[ 28.585052] Hardware name: LENOVO 20U90SIT19/20U90SIT19, BIOS N2WET30W (1.20 ) 08/26/2021
[ 28.585059] Call Trace:
[ 28.585064] <TASK>
[ 28.585073] dump_stack_lvl+0x59/0x73
[ 28.585087] __lock_acquire.cold+0xc5/0x2c6
[ 28.585106] ? find_held_lock+0x2b/0x80
[ 28.585124] lock_acquire+0xb5/0x2b0
[ 28.585132] ? pciehp_check_presence+0x23/0x80
[ 28.585144] ? lock_is_held_type+0xa8/0x120
[ 28.585161] down_read+0x3e/0x50
[ 28.585172] ? pciehp_check_presence+0x23/0x80
[ 28.585183] pciehp_check_presence+0x23/0x80
[ 28.585194] pciehp_runtime_resume+0x5c/0xa0
[ 28.585206] ? pci_msix_init+0x60/0x60
[ 28.585214] device_for_each_child+0x45/0x70
[ 28.585227] pcie_port_device_runtime_resume+0x20/0x30
[ 28.585236] pci_pm_runtime_resume+0xa7/0xc0
[ 28.585246] ? pci_pm_freeze_noirq+0x100/0x100
[ 28.585257] __rpm_callback+0x41/0x110
[ 28.585271] ? pci_pm_freeze_noirq+0x100/0x100
[ 28.585281] rpm_callback+0x59/0x70
[ 28.585293] rpm_resume+0x512/0x7b0
[ 28.585309] __pm_runtime_resume+0x4a/0x90
[ 28.585322] __device_release_driver+0x28/0x240
[ 28.585338] device_release_driver+0x26/0x40
[ 28.585351] pci_stop_bus_device+0x68/0x90
[ 28.585363] pci_stop_bus_device+0x2c/0x90
[ 28.585373] pci_stop_and_remove_bus_device+0xe/0x20
[ 28.585384] pciehp_unconfigure_device+0x6c/0x110
[ 28.585396] ? __pm_runtime_resume+0x58/0x90
[ 28.585409] pciehp_disable_slot+0x5b/0xe0
[ 28.585421] pciehp_handle_presence_or_link_change+0xc3/0x2f0
[ 28.585436] pciehp_ist+0x179/0x180
[ 28.585449] ? disable_irq_nosync+0x10/0x10
[ 28.585460] irq_thread_fn+0x1d/0x60
[ 28.585470] ? irq_thread+0x81/0x1a0
[ 28.585480] irq_thread+0xcb/0x1a0
[ 28.585491] ? irq_thread_fn+0x60/0x60
[ 28.585502] ? irq_thread_check_affinity+0xb0/0xb0
[ 28.585514] kthread+0x165/0x190
[ 28.585522] ? set_kthread_struct+0x40/0x40
[ 28.585531] ret_from_fork+0x1f/0x30
[ 28.585554] </TASK>
This lockdep warning is triggered because with Thunderbolt, hotplug ports
are nested. When removing multiple devices in a daisy-chain, each hotplug
port's reset_lock may be acquired recursively. It's never the same lock,
so the lockdep splat is a false positive.
Because locks at the same hierarchy level are never acquired recursively,
a per-level lockdep class is sufficient to fix the lockdep warning.
The choice to use one lockdep subclass per pcie-hotplug controller in
the path to the root-bus was made to conserve class keys because their
number is limited and the complexity grows quadratically with number of
keys according to Documentation/locking/lockdep-design.rst.
Link: https://lore.kernel.org/linux-pci/20190402021933.GA2966@mit.edu/
Link: https://lore.kernel.org/linux-pci/de684a28-9038-8fc6-27ca-3f6f2f6400d7@redh…
Cc: stable(a)vger.kernel.org
Reported-by: "Theodore Ts'o" <tytso(a)mit.edu>
Signed-off-by: Hans de Goede <hdegoede(a)redhat.com>
---
Changes in v2:
- Only use a subclass for each hotplug capable port/parent in the path to
the PCI root bus, instead of one for each level in the PCI hierarchy,
to avoid hitting MAX_LOCKDEP_SUBCLASSES
- Drop the "PCI: Add a pci_dev_depth() helper function" since we now need
a special version of this to only count hotplug ports
- Various commit message improvements
---
drivers/pci/hotplug/pciehp.h | 3 +++
drivers/pci/hotplug/pciehp_core.c | 2 +-
drivers/pci/hotplug/pciehp_hpc.c | 21 ++++++++++++++++++---
3 files changed, 22 insertions(+), 4 deletions(-)
diff --git a/drivers/pci/hotplug/pciehp.h b/drivers/pci/hotplug/pciehp.h
index 918dccbc74b6..e0a614acee05 100644
--- a/drivers/pci/hotplug/pciehp.h
+++ b/drivers/pci/hotplug/pciehp.h
@@ -75,6 +75,8 @@ extern int pciehp_poll_time;
* @reset_lock: prevents access to the Data Link Layer Link Active bit in the
* Link Status register and to the Presence Detect State bit in the Slot
* Status register during a slot reset which may cause them to flap
+ * @depth: Number of additional hotplug ports in the path to the root bus,
+ * used as lock subclass for @reset_lock
* @ist_running: flag to keep user request waiting while IRQ thread is running
* @request_result: result of last user request submitted to the IRQ thread
* @requester: wait queue to wake up on completion of user request,
@@ -106,6 +108,7 @@ struct controller {
struct hotplug_slot hotplug_slot; /* hotplug core interface */
struct rw_semaphore reset_lock;
+ unsigned int depth;
unsigned int ist_running;
int request_result;
wait_queue_head_t requester;
diff --git a/drivers/pci/hotplug/pciehp_core.c b/drivers/pci/hotplug/pciehp_core.c
index f34114d45259..4042d87d539d 100644
--- a/drivers/pci/hotplug/pciehp_core.c
+++ b/drivers/pci/hotplug/pciehp_core.c
@@ -166,7 +166,7 @@ static void pciehp_check_presence(struct controller *ctrl)
{
int occupied;
- down_read(&ctrl->reset_lock);
+ down_read_nested(&ctrl->reset_lock, ctrl->depth);
mutex_lock(&ctrl->state_lock);
occupied = pciehp_card_present_or_link_active(ctrl);
diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
index 83a0fa119cae..963fb50528da 100644
--- a/drivers/pci/hotplug/pciehp_hpc.c
+++ b/drivers/pci/hotplug/pciehp_hpc.c
@@ -583,7 +583,7 @@ static void pciehp_ignore_dpc_link_change(struct controller *ctrl,
* the corresponding link change may have been ignored above.
* Synthesize it to ensure that it is acted on.
*/
- down_read(&ctrl->reset_lock);
+ down_read_nested(&ctrl->reset_lock, ctrl->depth);
if (!pciehp_check_link_active(ctrl))
pciehp_request(ctrl, PCI_EXP_SLTSTA_DLLSC);
up_read(&ctrl->reset_lock);
@@ -746,7 +746,7 @@ static irqreturn_t pciehp_ist(int irq, void *dev_id)
* Disable requests have higher priority than Presence Detect Changed
* or Data Link Layer State Changed events.
*/
- down_read(&ctrl->reset_lock);
+ down_read_nested(&ctrl->reset_lock, ctrl->depth);
if (events & DISABLE_SLOT)
pciehp_handle_disable_request(ctrl);
else if (events & (PCI_EXP_SLTSTA_PDC | PCI_EXP_SLTSTA_DLLSC))
@@ -906,7 +906,7 @@ int pciehp_reset_slot(struct hotplug_slot *hotplug_slot, bool probe)
if (probe)
return 0;
- down_write(&ctrl->reset_lock);
+ down_write_nested(&ctrl->reset_lock, ctrl->depth);
if (!ATTN_BUTTN(ctrl)) {
ctrl_mask |= PCI_EXP_SLTCTL_PDCE;
@@ -962,6 +962,20 @@ static inline void dbg_ctrl(struct controller *ctrl)
#define FLAG(x, y) (((x) & (y)) ? '+' : '-')
+static inline int pcie_hotplug_depth(struct pci_dev *dev)
+{
+ struct pci_bus *bus = dev->bus;
+ int depth = 0;
+
+ while (bus->parent) {
+ bus = bus->parent;
+ if (bus->self && bus->self->is_hotplug_bridge)
+ depth++;
+ }
+
+ return depth;
+}
+
struct controller *pcie_init(struct pcie_device *dev)
{
struct controller *ctrl;
@@ -975,6 +989,7 @@ struct controller *pcie_init(struct pcie_device *dev)
return NULL;
ctrl->pcie = dev;
+ ctrl->depth = pcie_hotplug_depth(dev->port);
pcie_capability_read_dword(pdev, PCI_EXP_SLTCAP, &slot_cap);
if (pdev->hotplug_user_indicators)
--
2.33.1
From: Bob Peterson <rpeterso(a)redhat.com>
[ Upstream commit 49462e2be119d38c5eb5759d0d1b712df3a41239 ]
Before this patch, evict would clear the iopen glock's gl_object after
releasing the inode glock. In the meantime, another process could reuse
the same block and thus glocks for a new inode. It would lock the inode
glock (exclusively), and then the iopen glock (shared). The shared
locking mode doesn't provide any ordering against the evict, so by the
time the iopen glock is reused, evict may not have gotten to setting
gl_object to NULL.
Fix that by releasing the iopen glock before the inode glock in
gfs2_evict_inode.
Signed-off-by: Bob Peterson <rpeterso(a)redhat.com>gl_object
Signed-off-by: Andreas Gruenbacher <agruenba(a)redhat.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
fs/gfs2/super.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
index 6a355e1347d7f..d2b7ecbd1b150 100644
--- a/fs/gfs2/super.c
+++ b/fs/gfs2/super.c
@@ -1438,13 +1438,6 @@ static void gfs2_evict_inode(struct inode *inode)
gfs2_ordered_del_inode(ip);
clear_inode(inode);
gfs2_dir_hash_inval(ip);
- if (ip->i_gl) {
- glock_clear_object(ip->i_gl, ip);
- wait_on_bit_io(&ip->i_flags, GIF_GLOP_PENDING, TASK_UNINTERRUPTIBLE);
- gfs2_glock_add_to_lru(ip->i_gl);
- gfs2_glock_put_eventually(ip->i_gl);
- ip->i_gl = NULL;
- }
if (gfs2_holder_initialized(&ip->i_iopen_gh)) {
struct gfs2_glock *gl = ip->i_iopen_gh.gh_gl;
@@ -1457,6 +1450,13 @@ static void gfs2_evict_inode(struct inode *inode)
gfs2_holder_uninit(&ip->i_iopen_gh);
gfs2_glock_put_eventually(gl);
}
+ if (ip->i_gl) {
+ glock_clear_object(ip->i_gl, ip);
+ wait_on_bit_io(&ip->i_flags, GIF_GLOP_PENDING, TASK_UNINTERRUPTIBLE);
+ gfs2_glock_add_to_lru(ip->i_gl);
+ gfs2_glock_put_eventually(ip->i_gl);
+ ip->i_gl = NULL;
+ }
}
static struct inode *gfs2_alloc_inode(struct super_block *sb)
--
2.33.0
From: Bob Peterson <rpeterso(a)redhat.com>
[ Upstream commit 49462e2be119d38c5eb5759d0d1b712df3a41239 ]
Before this patch, evict would clear the iopen glock's gl_object after
releasing the inode glock. In the meantime, another process could reuse
the same block and thus glocks for a new inode. It would lock the inode
glock (exclusively), and then the iopen glock (shared). The shared
locking mode doesn't provide any ordering against the evict, so by the
time the iopen glock is reused, evict may not have gotten to setting
gl_object to NULL.
Fix that by releasing the iopen glock before the inode glock in
gfs2_evict_inode.
Signed-off-by: Bob Peterson <rpeterso(a)redhat.com>gl_object
Signed-off-by: Andreas Gruenbacher <agruenba(a)redhat.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
fs/gfs2/super.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
index 6e00d15ef0a82..cc51b5f5f52d8 100644
--- a/fs/gfs2/super.c
+++ b/fs/gfs2/super.c
@@ -1402,13 +1402,6 @@ static void gfs2_evict_inode(struct inode *inode)
gfs2_ordered_del_inode(ip);
clear_inode(inode);
gfs2_dir_hash_inval(ip);
- if (ip->i_gl) {
- glock_clear_object(ip->i_gl, ip);
- wait_on_bit_io(&ip->i_flags, GIF_GLOP_PENDING, TASK_UNINTERRUPTIBLE);
- gfs2_glock_add_to_lru(ip->i_gl);
- gfs2_glock_put_eventually(ip->i_gl);
- ip->i_gl = NULL;
- }
if (gfs2_holder_initialized(&ip->i_iopen_gh)) {
struct gfs2_glock *gl = ip->i_iopen_gh.gh_gl;
@@ -1421,6 +1414,13 @@ static void gfs2_evict_inode(struct inode *inode)
gfs2_holder_uninit(&ip->i_iopen_gh);
gfs2_glock_put_eventually(gl);
}
+ if (ip->i_gl) {
+ glock_clear_object(ip->i_gl, ip);
+ wait_on_bit_io(&ip->i_flags, GIF_GLOP_PENDING, TASK_UNINTERRUPTIBLE);
+ gfs2_glock_add_to_lru(ip->i_gl);
+ gfs2_glock_put_eventually(ip->i_gl);
+ ip->i_gl = NULL;
+ }
}
static struct inode *gfs2_alloc_inode(struct super_block *sb)
--
2.33.0
Reserving memory using efi_mem_reserve() calls into the x86
efi_arch_mem_reserve() function. This function will insert a new EFI
memory descriptor into the EFI memory map representing the area of
memory to be reserved and marking it as EFI runtime memory.
As part of adding this new entry, a new EFI memory map is allocated and
mapped. The mapping is where a problem can occur. This new EFI memory map
is mapped using early_memremap(). If the allocated memory comes from an
area that is marked as EFI_BOOT_SERVICES_DATA memory in the current EFI
memory map, then it will be mapped unencrypted (see memremap_is_efi_data()
and the call to efi_mem_type()).
However, during replacement of the old EFI memory map with the new EFI
memory map, efi_mem_type() is disabled, resulting in the new EFI memory
map always being mapped encrypted in efi.memmap. This will cause a kernel
crash later in the boot.
Since it is known that the new EFI memory map will always be mapped
encrypted when efi_memmap_install() is called, explicitly map the new EFI
memory map as encrypted (using early_memremap_prot()) when inserting the
new memory map entry.
Cc: <stable(a)vger.kernel.org> # 4.14.x
Fixes: 8f716c9b5feb ("x86/mm: Add support to access boot related data in the clear")
Acked-by: Ard Biesheuvel <ardb(a)kernel.org>
Signed-off-by: Tom Lendacky <thomas.lendacky(a)amd.com>
---
Changes for v2:
- Update/expand commit message to (hopefully) make it easier to read and
understand
- Add a comment around the use of the early_memremap_prot() call
---
arch/x86/platform/efi/quirks.c | 14 +++++++++++++-
1 file changed, 13 insertions(+), 1 deletion(-)
diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
index b15ebfe40a73..14f8f20d727a 100644
--- a/arch/x86/platform/efi/quirks.c
+++ b/arch/x86/platform/efi/quirks.c
@@ -277,7 +277,19 @@ void __init efi_arch_mem_reserve(phys_addr_t addr, u64 size)
return;
}
- new = early_memremap(data.phys_map, data.size);
+ /*
+ * When SME is active, early_memremap() can map the memory unencrypted
+ * if the allocation came from EFI_BOOT_SERVICES_DATA (see
+ * memremap_is_efi_data() and the call to efi_mem_type()). However,
+ * when efi_memmap_install() is called to replace the memory map,
+ * efi_mem_type() is "disabled" and so the memory will always be mapped
+ * encrypted. To avoid this possible mismatch between the mappings,
+ * always map the newly allocated memmap memory as encrypted.
+ *
+ * When SME is not active, this behaves just like early_memremap().
+ */
+ new = early_memremap_prot(data.phys_map, data.size,
+ pgprot_val(pgprot_encrypted(FIXMAP_PAGE_NORMAL)));
if (!new) {
pr_err("Failed to map new boot services memmap\n");
return;
--
2.33.1