On Wed, Oct 16, 2024 at 04:50:56PM +0800, kernel test robot wrote:
Hello,
kernel test robot noticed "BUG:unable_to_handle_page_fault_for_address" on:
Thanks, see below for analysis.
commit: e65dbb5c9051a4da2305787fd558e1d60de2275a ("[PATCH v2 1/3] pidfd: extend pidfd_get_pid() and de-duplicate pid lookup") url: https://github.com/intel-lab-lkp/linux/commits/Lorenzo-Stoakes/pidfd-extend-... base: https://git.kernel.org/cgit/linux/kernel/git/shuah/linux-kselftest.git next patch link: https://lore.kernel.org/all/8e7edaf2f648fb01a71def749f17f76c0502dee1.1728643... patch subject: [PATCH v2 1/3] pidfd: extend pidfd_get_pid() and de-duplicate pid lookup
in testcase: trinity version: trinity-i386-abe9de86-1_20230429 with following parameters:
runtime: 600s
config: x86_64-randconfig-072-20241015 compiler: gcc-12 test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
(please refer to attached dmesg/kmsg for entire log/backtrace)
If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot oliver.sang@intel.com | Closes: https://lore.kernel.org/oe-lkp/202410161634.abca3854-lkp@intel.com
[ 416.054386][ T1959] BUG: unable to handle page fault for address: ffffffff8fed9474 [ 416.055651][ T1959] #PF: supervisor write access in kernel mode [ 416.056550][ T1959] #PF: error_code(0x0003) - permissions violation [ 416.057502][ T1959] PGD 3e90f5067 P4D 3e90f5067 PUD 3e90f6063 PMD 3e50001a1 [ 416.058587][ T1959] Oops: Oops: 0003 [#1] PREEMPT SMP KASAN [ 416.059414][ T1959] CPU: 1 UID: 65534 PID: 1959 Comm: trinity-c3 Not tainted 6.12.0-rc1-00004-ge65dbb5c9051 #1 d7a38916ac9252f968706afc2c77f70fbdabe689 [ 416.061328][ T1959] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 [ 416.062850][ T1959] RIP: 0010:fput (arch/x86/include/asm/atomic64_64.h:61 include/linux/atomic/atomic-arch-fallback.h:4404 include/linux/atomic/atomic-long.h:1571 include/linux/atomic/atomic-instrumented.h:4540 fs/file_table.c:482) [ 416.063578][ T1959] Code: ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 f3 0f 1e fa 55 48 89 e5 41 55 41 54 53 48 89 fb be 08 00 00 00 e8 96 c6 f7 ff <f0> 48 ff 0b 0f 85 dd 00 00 00 65 4c 8b 25 04 ff 0e 70 4c 8d 6b 48 All code ======== 0: ff (bad) 1: ff 66 66 jmp *0x66(%rsi) 4: 2e 0f 1f 84 00 00 00 cs nopl 0x0(%rax,%rax,1) b: 00 00 d: 0f 1f 00 nopl (%rax) 10: f3 0f 1e fa endbr64 14: 55 push %rbp 15: 48 89 e5 mov %rsp,%rbp 18: 41 55 push %r13 1a: 41 54 push %r12 1c: 53 push %rbx 1d: 48 89 fb mov %rdi,%rbx 20: be 08 00 00 00 mov $0x8,%esi 25: e8 96 c6 f7 ff call 0xfffffffffff7c6c0 2a:* f0 48 ff 0b lock decq (%rbx) <-- trapping instruction
OK so this looks like the fput() invoking atomic_long_dec_and_test() on an invalid &file->f_count.
It looks like 0xffffffff8fed9474 in RBX is the file...
And that's because I'm not setting f in SYSCALL_DEFINE4(pidfd_send_signal, ...) at:
pidfd_to_pid_proc(pidfd, &f_flags, &f);
On error and yet then jump to
err: fdput(f); return ret;
Which is trying to fdput() (thus fput()) the f, ugh.
OK I will fix this + respin, thanks for the report!
[snip]