From: Andrew Kanner andrew.kanner@gmail.com
[ Upstream commit cade5397e5461295f3cb87880534b6a07cafa427 ]
Syzkaller reported the following issue: ================================================================== BUG: KASAN: double-free in slab_free mm/slub.c:3787 [inline] BUG: KASAN: double-free in __kmem_cache_free+0x71/0x110 mm/slub.c:3800 Free of addr ffff888086408000 by task syz-executor.4/12750 [...] Call Trace: <TASK> [...] kasan_report_invalid_free+0xac/0xd0 mm/kasan/report.c:482 ____kasan_slab_free+0xfb/0x120 kasan_slab_free include/linux/kasan.h:177 [inline] slab_free_hook mm/slub.c:1781 [inline] slab_free_freelist_hook+0x12e/0x1a0 mm/slub.c:1807 slab_free mm/slub.c:3787 [inline] __kmem_cache_free+0x71/0x110 mm/slub.c:3800 dbUnmount+0xf4/0x110 fs/jfs/jfs_dmap.c:264 jfs_umount+0x248/0x3b0 fs/jfs/jfs_umount.c:87 jfs_put_super+0x86/0x190 fs/jfs/super.c:194 generic_shutdown_super+0x130/0x310 fs/super.c:492 kill_block_super+0x79/0xd0 fs/super.c:1386 deactivate_locked_super+0xa7/0xf0 fs/super.c:332 cleanup_mnt+0x494/0x520 fs/namespace.c:1291 task_work_run+0x243/0x300 kernel/task_work.c:179 resume_user_mode_work include/linux/resume_user_mode.h:49 [inline] exit_to_user_mode_loop+0x124/0x150 kernel/entry/common.c:171 exit_to_user_mode_prepare+0xb2/0x140 kernel/entry/common.c:203 __syscall_exit_to_user_mode_work kernel/entry/common.c:285 [inline] syscall_exit_to_user_mode+0x26/0x60 kernel/entry/common.c:296 do_syscall_64+0x49/0xb0 arch/x86/entry/common.c:86 entry_SYSCALL_64_after_hwframe+0x63/0xcd [...] </TASK>
Allocated by task 13352: kasan_save_stack mm/kasan/common.c:45 [inline] kasan_set_track+0x3d/0x60 mm/kasan/common.c:52 ____kasan_kmalloc mm/kasan/common.c:371 [inline] __kasan_kmalloc+0x97/0xb0 mm/kasan/common.c:380 kmalloc include/linux/slab.h:580 [inline] dbMount+0x54/0x980 fs/jfs/jfs_dmap.c:164 jfs_mount+0x1dd/0x830 fs/jfs/jfs_mount.c:121 jfs_fill_super+0x590/0xc50 fs/jfs/super.c:556 mount_bdev+0x26c/0x3a0 fs/super.c:1359 legacy_get_tree+0xea/0x180 fs/fs_context.c:610 vfs_get_tree+0x88/0x270 fs/super.c:1489 do_new_mount+0x289/0xad0 fs/namespace.c:3145 do_mount fs/namespace.c:3488 [inline] __do_sys_mount fs/namespace.c:3697 [inline] __se_sys_mount+0x2d3/0x3c0 fs/namespace.c:3674 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd
Freed by task 13352: kasan_save_stack mm/kasan/common.c:45 [inline] kasan_set_track+0x3d/0x60 mm/kasan/common.c:52 kasan_save_free_info+0x27/0x40 mm/kasan/generic.c:518 ____kasan_slab_free+0xd6/0x120 mm/kasan/common.c:236 kasan_slab_free include/linux/kasan.h:177 [inline] slab_free_hook mm/slub.c:1781 [inline] slab_free_freelist_hook+0x12e/0x1a0 mm/slub.c:1807 slab_free mm/slub.c:3787 [inline] __kmem_cache_free+0x71/0x110 mm/slub.c:3800 dbUnmount+0xf4/0x110 fs/jfs/jfs_dmap.c:264 jfs_mount_rw+0x545/0x740 fs/jfs/jfs_mount.c:247 jfs_remount+0x3db/0x710 fs/jfs/super.c:454 reconfigure_super+0x3bc/0x7b0 fs/super.c:935 vfs_fsconfig_locked fs/fsopen.c:254 [inline] __do_sys_fsconfig fs/fsopen.c:439 [inline] __se_sys_fsconfig+0xad5/0x1060 fs/fsopen.c:314 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd [...]
JFS_SBI(ipbmap->i_sb)->bmap wasn't set to NULL after kfree() in dbUnmount().
Syzkaller uses faultinject to reproduce this KASAN double-free warning. The issue is triggered if either diMount() or dbMount() fail in jfs_remount(), since diUnmount() or dbUnmount() already happened in such a case - they will do double-free on next execution: jfs_umount or jfs_remount.
Tested on both upstream and jfs-next by syzkaller.
Reported-and-tested-by: syzbot+6a93efb725385bc4b2e9@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/000000000000471f2d05f1ce8bad@google.com/T/ Link: https://syzkaller.appspot.com/bug?extid=6a93efb725385bc4b2e9 Signed-off-by: Andrew Kanner andrew.kanner@gmail.com Signed-off-by: Dave Kleikamp dave.kleikamp@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/jfs/jfs_dmap.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/fs/jfs/jfs_dmap.c b/fs/jfs/jfs_dmap.c index 8e8d53241386f..a785c747a8cbb 100644 --- a/fs/jfs/jfs_dmap.c +++ b/fs/jfs/jfs_dmap.c @@ -269,6 +269,7 @@ int dbUnmount(struct inode *ipbmap, int mounterror)
/* free the memory for the in-memory bmap. */ kfree(bmp); + JFS_SBI(ipbmap->i_sb)->bmap = NULL;
return (0); }
From: Liu Shixin via Jfs-discussion jfs-discussion@lists.sourceforge.net
[ Upstream commit 6e2bda2c192d0244b5a78b787ef20aa10cb319b7 ]
syzbot found an invalid-free in diUnmount:
BUG: KASAN: double-free in slab_free mm/slub.c:3661 [inline] BUG: KASAN: double-free in __kmem_cache_free+0x71/0x110 mm/slub.c:3674 Free of addr ffff88806f410000 by task syz-executor131/3632
CPU: 0 PID: 3632 Comm: syz-executor131 Not tainted 6.1.0-rc7-syzkaller-00012-gca57f02295f1 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/26/2022 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x1b1/0x28e lib/dump_stack.c:106 print_address_description+0x74/0x340 mm/kasan/report.c:284 print_report+0x107/0x1f0 mm/kasan/report.c:395 kasan_report_invalid_free+0xac/0xd0 mm/kasan/report.c:460 ____kasan_slab_free+0xfb/0x120 kasan_slab_free include/linux/kasan.h:177 [inline] slab_free_hook mm/slub.c:1724 [inline] slab_free_freelist_hook+0x12e/0x1a0 mm/slub.c:1750 slab_free mm/slub.c:3661 [inline] __kmem_cache_free+0x71/0x110 mm/slub.c:3674 diUnmount+0xef/0x100 fs/jfs/jfs_imap.c:195 jfs_umount+0x108/0x370 fs/jfs/jfs_umount.c:63 jfs_put_super+0x86/0x190 fs/jfs/super.c:194 generic_shutdown_super+0x130/0x310 fs/super.c:492 kill_block_super+0x79/0xd0 fs/super.c:1428 deactivate_locked_super+0xa7/0xf0 fs/super.c:332 cleanup_mnt+0x494/0x520 fs/namespace.c:1186 task_work_run+0x243/0x300 kernel/task_work.c:179 exit_task_work include/linux/task_work.h:38 [inline] do_exit+0x664/0x2070 kernel/exit.c:820 do_group_exit+0x1fd/0x2b0 kernel/exit.c:950 __do_sys_exit_group kernel/exit.c:961 [inline] __se_sys_exit_group kernel/exit.c:959 [inline] __x64_sys_exit_group+0x3b/0x40 kernel/exit.c:959 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd [...]
JFS_IP(ipimap)->i_imap is not setting to NULL after free in diUnmount. If jfs_remount() free JFS_IP(ipimap)->i_imap but then failed at diMount(). JFS_IP(ipimap)->i_imap will be freed once again. Fix this problem by setting JFS_IP(ipimap)->i_imap to NULL after free.
Reported-by: syzbot+90a11e6b1e810785c6ff@syzkaller.appspotmail.com Signed-off-by: Liu Shixin liushixin2@huawei.com Signed-off-by: Dave Kleikamp dave.kleikamp@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/jfs/jfs_imap.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/fs/jfs/jfs_imap.c b/fs/jfs/jfs_imap.c index 937ca07b58b1d..67c67604b8c85 100644 --- a/fs/jfs/jfs_imap.c +++ b/fs/jfs/jfs_imap.c @@ -195,6 +195,7 @@ int diUnmount(struct inode *ipimap, int mounterror) * free in-memory control structure */ kfree(imap); + JFS_IP(ipimap)->i_imap = NULL;
return (0); }
From: Niklas Schnelle schnelle@linux.ibm.com
[ Upstream commit f768c75d61582b011962f9dcb9ff8eafb8da0383 ]
In the future inw() and friends will not be compiled on architectures without I/O port support.
Co-developed-by: Arnd Bergmann arnd@kernel.org Link: https://lore.kernel.org/r/20230703135255.2202721-2-schnelle@linux.ibm.com Signed-off-by: Arnd Bergmann arnd@kernel.org Signed-off-by: Niklas Schnelle schnelle@linux.ibm.com Signed-off-by: Bjorn Helgaas bhelgaas@google.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/pci/quirks.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index 73260bd217278..947c2c2d8c8c5 100644 --- a/drivers/pci/quirks.c +++ b/drivers/pci/quirks.c @@ -270,6 +270,7 @@ DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_NEC, PCI_DEVICE_ID_NEC_CBUS_1, quirk_isa_d DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_NEC, PCI_DEVICE_ID_NEC_CBUS_2, quirk_isa_dma_hangs); DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_NEC, PCI_DEVICE_ID_NEC_CBUS_3, quirk_isa_dma_hangs);
+#ifdef CONFIG_HAS_IOPORT /* * Intel NM10 "TigerPoint" LPC PM1a_STS.BM_STS must be clear * for some HT machines to use C4 w/o hanging. @@ -289,6 +290,7 @@ static void quirk_tigerpoint_bm_sts(struct pci_dev *dev) } } DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_TGP_LPC, quirk_tigerpoint_bm_sts); +#endif
/* Chipsets where PCI->PCI transfers vanish or hang */ static void quirk_nopcipci(struct pci_dev *dev)
From: Tomislav Novak tnovak@fb.com
[ Upstream commit e6b51532d5273eeefba84106daea3d392c602837 ]
Arm platforms use is_default_overflow_handler() to determine if the hw_breakpoint code should single-step over the breakpoint trigger or let the custom handler deal with it.
Since bpf_overflow_handler() currently isn't recognized as a default handler, attaching a BPF program to a PERF_TYPE_BREAKPOINT event causes it to keep firing (the instruction triggering the data abort exception is never skipped). For example:
# bpftrace -e 'watchpoint:0x10000:4:w { print("hit") }' -c ./test Attaching 1 probe... hit hit [...] ^C
(./test performs a single 4-byte store to 0x10000)
This patch replaces the check with uses_default_overflow_handler(), which accounts for the bpf_overflow_handler() case by also testing if one of the perf_event_output functions gets invoked indirectly, via orig_default_handler.
Link: https://lore.kernel.org/linux-arm-kernel/20220923203644.2731604-1-tnovak@fb....
Signed-off-by: Tomislav Novak tnovak@fb.com Tested-by: Samuel Gosselin sgosselin@google.com # arm64 Reviewed-by: Catalin Marinas catalin.marinas@arm.com Acked-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Russell King (Oracle) rmk+kernel@armlinux.org.uk Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm/kernel/hw_breakpoint.c | 8 ++++---- arch/arm64/kernel/hw_breakpoint.c | 4 ++-- include/linux/perf_event.h | 22 +++++++++++++++++++--- 3 files changed, 25 insertions(+), 9 deletions(-)
diff --git a/arch/arm/kernel/hw_breakpoint.c b/arch/arm/kernel/hw_breakpoint.c index b06d9ea07c846..a69dd64a84017 100644 --- a/arch/arm/kernel/hw_breakpoint.c +++ b/arch/arm/kernel/hw_breakpoint.c @@ -623,7 +623,7 @@ int hw_breakpoint_arch_parse(struct perf_event *bp, hw->address &= ~alignment_mask; hw->ctrl.len <<= offset;
- if (is_default_overflow_handler(bp)) { + if (uses_default_overflow_handler(bp)) { /* * Mismatch breakpoints are required for single-stepping * breakpoints. @@ -795,7 +795,7 @@ static void watchpoint_handler(unsigned long addr, unsigned int fsr, * Otherwise, insert a temporary mismatch breakpoint so that * we can single-step over the watchpoint trigger. */ - if (!is_default_overflow_handler(wp)) + if (!uses_default_overflow_handler(wp)) continue; step: enable_single_step(wp, instruction_pointer(regs)); @@ -808,7 +808,7 @@ static void watchpoint_handler(unsigned long addr, unsigned int fsr, info->trigger = addr; pr_debug("watchpoint fired: address = 0x%x\n", info->trigger); perf_bp_event(wp, regs); - if (is_default_overflow_handler(wp)) + if (uses_default_overflow_handler(wp)) enable_single_step(wp, instruction_pointer(regs)); }
@@ -883,7 +883,7 @@ static void breakpoint_handler(unsigned long unknown, struct pt_regs *regs) info->trigger = addr; pr_debug("breakpoint fired: address = 0x%x\n", addr); perf_bp_event(bp, regs); - if (is_default_overflow_handler(bp)) + if (uses_default_overflow_handler(bp)) enable_single_step(bp, addr); goto unlock; } diff --git a/arch/arm64/kernel/hw_breakpoint.c b/arch/arm64/kernel/hw_breakpoint.c index b4a1607958246..534578eba556e 100644 --- a/arch/arm64/kernel/hw_breakpoint.c +++ b/arch/arm64/kernel/hw_breakpoint.c @@ -654,7 +654,7 @@ static int breakpoint_handler(unsigned long unused, unsigned int esr, perf_bp_event(bp, regs);
/* Do we need to handle the stepping? */ - if (is_default_overflow_handler(bp)) + if (uses_default_overflow_handler(bp)) step = 1; unlock: rcu_read_unlock(); @@ -733,7 +733,7 @@ static u64 get_distance_from_watchpoint(unsigned long addr, u64 val, static int watchpoint_report(struct perf_event *wp, unsigned long addr, struct pt_regs *regs) { - int step = is_default_overflow_handler(wp); + int step = uses_default_overflow_handler(wp); struct arch_hw_breakpoint *info = counter_arch_bp(wp);
info->trigger = addr; diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index b7ac395513c0f..c99e2f851d312 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -1018,15 +1018,31 @@ extern int perf_event_output(struct perf_event *event, struct pt_regs *regs);
static inline bool -is_default_overflow_handler(struct perf_event *event) +__is_default_overflow_handler(perf_overflow_handler_t overflow_handler) { - if (likely(event->overflow_handler == perf_event_output_forward)) + if (likely(overflow_handler == perf_event_output_forward)) return true; - if (unlikely(event->overflow_handler == perf_event_output_backward)) + if (unlikely(overflow_handler == perf_event_output_backward)) return true; return false; }
+#define is_default_overflow_handler(event) \ + __is_default_overflow_handler((event)->overflow_handler) + +#ifdef CONFIG_BPF_SYSCALL +static inline bool uses_default_overflow_handler(struct perf_event *event) +{ + if (likely(is_default_overflow_handler(event))) + return true; + + return __is_default_overflow_handler(event->orig_overflow_handler); +} +#else +#define uses_default_overflow_handler(event) \ + is_default_overflow_handler(event) +#endif + extern void perf_event_header__init_id(struct perf_event_header *header, struct perf_sample_data *data,
From: ruanjinjie ruanjinjie@huawei.com
[ Upstream commit afda85b963c12947e298ad85d757e333aa40fd74 ]
If device_register() returns error in ibmebus_bus_init(), name of kobject which is allocated in dev_set_name() called in device_add() is leaked.
As comment of device_add() says, it should call put_device() to drop the reference count that was set in device_initialize() when it fails, so the name can be freed in kobject_cleanup().
Signed-off-by: ruanjinjie ruanjinjie@huawei.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://msgid.link/20221110011929.3709774-1-ruanjinjie@huawei.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/powerpc/platforms/pseries/ibmebus.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/arch/powerpc/platforms/pseries/ibmebus.c b/arch/powerpc/platforms/pseries/ibmebus.c index b91eb0929ed14..55569e3c9db72 100644 --- a/arch/powerpc/platforms/pseries/ibmebus.c +++ b/arch/powerpc/platforms/pseries/ibmebus.c @@ -450,6 +450,7 @@ static int __init ibmebus_bus_init(void) if (err) { printk(KERN_WARNING "%s: device_register returned %i\n", __func__, err); + put_device(&ibmebus_bus_device); bus_unregister(&ibmebus_bus_type);
return err;
From: Mahesh Salgaonkar mahesh@linux.ibm.com
[ Upstream commit 77583f77ed9b1452ac62caebf09b2206da10bbf9 ]
When certain PHB HW failure causes pHyp to recover PHB, it marks the PE state as temporarily unavailable until recovery is complete. This also triggers an EEH handler in Linux which needs to notify drivers, and perform recovery. But before notifying the driver about the PCI error it uses get_adapter_status()->rpaphp_get_sensor_state()->rtas_call(get-sensor-state) operation of the hotplug_slot to determine if the slot contains a device or not. If the slot is empty, the recovery is skipped entirely.
eeh_event_handler() ->eeh_handle_normal_event() ->eeh_slot_presence_check() ->get_adapter_status() ->rpaphp_get_sensor_state() ->rtas_get_sensor() ->rtas_call(get-sensor-state)
However on certain PHB failures, the RTAS call rtas_call(get-sensor-state) returns extended busy error (9902) until PHB is recovered by pHyp. Once PHB is recovered, the rtas_call(get-sensor-state) returns success with correct presence status. The RTAS call interface rtas_get_sensor() loops over the RTAS call on extended delay return code (9902) until the return value is either success (0) or error (-1). This causes the EEH handler to get stuck for ~6 seconds before it could notify that the PCI error has been detected and stop any active operations. Hence with running I/O traffic, during this 6 seconds, the network driver continues its operation and hits a timeout (netdev watchdog).
------------ [52732.244731] DEBUG: ibm_read_slot_reset_state2() [52732.244762] DEBUG: ret = 0, rets[0]=5, rets[1]=1, rets[2]=4000, rets[3]=> [52732.244798] DEBUG: in eeh_slot_presence_check [52732.244804] DEBUG: error state check [52732.244807] DEBUG: Is slot hotpluggable [52732.244810] DEBUG: hotpluggable ops ? [52732.244953] DEBUG: Calling ops->get_adapter_status [52732.244958] DEBUG: calling rpaphp_get_sensor_state [52736.564262] ------------[ cut here ]------------ [52736.564299] NETDEV WATCHDOG: enP64p1s0f3 (tg3): transmit queue 0 timed o> [52736.564324] WARNING: CPU: 1442 PID: 0 at net/sched/sch_generic.c:478 dev> [...] [52736.564505] NIP [c000000000c32368] dev_watchdog+0x438/0x440 [52736.564513] LR [c000000000c32364] dev_watchdog+0x434/0x440 ------------
On timeouts, network driver starts dumping debug information to console (e.g bnx2 driver calls bnx2x_panic_dump()), and go into recovery path while pHyp is still recovering the PHB. As part of recovery, the driver tries to reset the device and it keeps failing since every PCI read/write returns ff's. And when EEH recovery kicks-in, the driver is unable to recover the device. This impacts the ssh connection and leads to the system being inaccessible. To get the NIC working again it needs a reboot or re-assign the I/O adapter from HMC.
[ 9531.168587] EEH: Beginning: 'slot_reset' [ 9531.168601] PCI 0013:01:00.0#10000: EEH: Invoking bnx2x->slot_reset() [...] [ 9614.110094] bnx2x: [bnx2x_func_stop:9129(enP19p1s0f0)]FUNC_STOP ramrod failed. Running a dry transaction [ 9614.110300] bnx2x: [bnx2x_igu_int_disable:902(enP19p1s0f0)]BUG! Proper val not read from IGU! [ 9629.178067] bnx2x: [bnx2x_fw_command:3055(enP19p1s0f0)]FW failed to respond! [ 9629.178085] bnx2x 0013:01:00.0 enP19p1s0f0: bc 7.10.4 [ 9629.178091] bnx2x: [bnx2x_fw_dump_lvl:789(enP19p1s0f0)]Cannot dump MCP info while in PCI error [ 9644.241813] bnx2x: [bnx2x_io_slot_reset:14245(enP19p1s0f0)]IO slot reset --> driver unload [...] [ 9644.241819] PCI 0013:01:00.0#10000: EEH: bnx2x driver reports: 'disconnect' [ 9644.241823] PCI 0013:01:00.1#10000: EEH: Invoking bnx2x->slot_reset() [ 9644.241827] bnx2x: [bnx2x_io_slot_reset:14229(enP19p1s0f1)]IO slot reset initializing... [ 9644.241916] bnx2x 0013:01:00.1: enabling device (0140 -> 0142) [ 9644.258604] bnx2x: [bnx2x_io_slot_reset:14245(enP19p1s0f1)]IO slot reset --> driver unload [ 9644.258612] PCI 0013:01:00.1#10000: EEH: bnx2x driver reports: 'disconnect' [ 9644.258615] EEH: Finished:'slot_reset' with aggregate recovery state:'disconnect' [ 9644.258620] EEH: Unable to recover from failure from PHB#13-PE#10000. [ 9644.261811] EEH: Beginning: 'error_detected(permanent failure)' [...] [ 9644.261823] EEH: Finished:'error_detected(permanent failure)'
Hence, it becomes important to inform driver about the PCI error detection as early as possible, so that driver is aware of PCI error and waits for EEH handler's next action for successful recovery.
Current implementation uses rtas_get_sensor() API which blocks the slot check state until RTAS call returns success. To avoid this, fix the PCI hotplug driver (rpaphp) to return an error (-EBUSY) if the slot presence state can not be detected immediately while PE is in EEH recovery state. Change rpaphp_get_sensor_state() to invoke rtas_call(get-sensor-state) directly only if the respective PE is in EEH recovery state, and take actions based on RTAS return status. This way EEH handler will not be blocked on rpaphp_get_sensor_state() and can immediately notify driver about the PCI error and stop any active operations.
In normal cases (non-EEH case) rpaphp_get_sensor_state() will continue to invoke rtas_get_sensor() as it was earlier with no change in existing behavior.
Signed-off-by: Mahesh Salgaonkar mahesh@linux.ibm.com Reviewed-by: Nathan Lynch nathanl@linux.ibm.com Acked-by: Bjorn Helgaas bhelgaas@google.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://msgid.link/169235815601.193557.13989873835811325343.stgit@jupiter Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/pci/hotplug/rpaphp_pci.c | 85 ++++++++++++++++++++++++++++++-- 1 file changed, 82 insertions(+), 3 deletions(-)
diff --git a/drivers/pci/hotplug/rpaphp_pci.c b/drivers/pci/hotplug/rpaphp_pci.c index beca61badeea4..5228d0044caa9 100644 --- a/drivers/pci/hotplug/rpaphp_pci.c +++ b/drivers/pci/hotplug/rpaphp_pci.c @@ -18,12 +18,92 @@ #include "../pci.h" /* for pci_add_new_bus */ #include "rpaphp.h"
+/* + * RTAS call get-sensor-state(DR_ENTITY_SENSE) return values as per PAPR: + * -- generic return codes --- + * -1: Hardware Error + * -2: RTAS_BUSY + * -3: Invalid sensor. RTAS Parameter Error. + * -- rtas_get_sensor function specific return codes --- + * -9000: Need DR entity to be powered up and unisolated before RTAS call + * -9001: Need DR entity to be powered up, but not unisolated, before RTAS call + * -9002: DR entity unusable + * 990x: Extended delay - where x is a number in the range of 0-5 + */ +#define RTAS_SLOT_UNISOLATED -9000 +#define RTAS_SLOT_NOT_UNISOLATED -9001 +#define RTAS_SLOT_NOT_USABLE -9002 + +static int rtas_get_sensor_errno(int rtas_rc) +{ + switch (rtas_rc) { + case 0: + /* Success case */ + return 0; + case RTAS_SLOT_UNISOLATED: + case RTAS_SLOT_NOT_UNISOLATED: + return -EFAULT; + case RTAS_SLOT_NOT_USABLE: + return -ENODEV; + case RTAS_BUSY: + case RTAS_EXTENDED_DELAY_MIN...RTAS_EXTENDED_DELAY_MAX: + return -EBUSY; + default: + return rtas_error_rc(rtas_rc); + } +} + +/* + * get_adapter_status() can be called by the EEH handler during EEH recovery. + * On certain PHB failures, the RTAS call rtas_call(get-sensor-state) returns + * extended busy error (9902) until PHB is recovered by pHyp. The RTAS call + * interface rtas_get_sensor() loops over the RTAS call on extended delay + * return code (9902) until the return value is either success (0) or error + * (-1). This causes the EEH handler to get stuck for ~6 seconds before it + * could notify that the PCI error has been detected and stop any active + * operations. This sometimes causes EEH recovery to fail. To avoid this issue, + * invoke rtas_call(get-sensor-state) directly if the respective PE is in EEH + * recovery state and return -EBUSY error based on RTAS return status. This + * will help the EEH handler to notify the driver about the PCI error + * immediately and successfully proceed with EEH recovery steps. + */ + +static int __rpaphp_get_sensor_state(struct slot *slot, int *state) +{ + int rc; + int token = rtas_token("get-sensor-state"); + struct pci_dn *pdn; + struct eeh_pe *pe; + struct pci_controller *phb = PCI_DN(slot->dn)->phb; + + if (token == RTAS_UNKNOWN_SERVICE) + return -ENOENT; + + /* + * Fallback to existing method for empty slot or PE isn't in EEH + * recovery. + */ + pdn = list_first_entry_or_null(&PCI_DN(phb->dn)->child_list, + struct pci_dn, list); + if (!pdn) + goto fallback; + + pe = eeh_dev_to_pe(pdn->edev); + if (pe && (pe->state & EEH_PE_RECOVERING)) { + rc = rtas_call(token, 2, 2, state, DR_ENTITY_SENSE, + slot->index); + return rtas_get_sensor_errno(rc); + } +fallback: + return rtas_get_sensor(DR_ENTITY_SENSE, slot->index, state); +} + int rpaphp_get_sensor_state(struct slot *slot, int *state) { int rc; int setlevel;
- rc = rtas_get_sensor(DR_ENTITY_SENSE, slot->index, state); + rc = __rpaphp_get_sensor_state(slot, state);
if (rc < 0) { if (rc == -EFAULT || rc == -EEXIST) { @@ -39,8 +119,7 @@ int rpaphp_get_sensor_state(struct slot *slot, int *state) dbg("%s: power on slot[%s] failed rc=%d.\n", __func__, slot->name, rc); } else { - rc = rtas_get_sensor(DR_ENTITY_SENSE, - slot->index, state); + rc = __rpaphp_get_sensor_state(slot, state); } } else if (rc == -ENODEV) info("%s: slot is unusable\n", __func__);
linux-stable-mirror@lists.linaro.org