From: Dan Carpenter error27@gmail.com Date: Fri, 23 Dec 2022 12:04:18 +0300
On Thu, Dec 22, 2022 at 06:21:03PM +0530, Prashanth K wrote:
On 14-12-22 11:05 pm, David Laight wrote:
From: Greg Kroah-Hartman
Sent: 12 December 2022 13:35
On Mon, Dec 12, 2022 at 06:54:24PM +0530, Prashanth K wrote:
Function pointer ki_complete() expects 'long' as its second argument, but we pass integer from ffs_user_copy_worker. This might cause a CFI failure, as ki_complete is an indirect call with mismatched prototype. Fix this by typecasting the second argument to long.
"might"? Does it or not? If it does, why hasn't this been reported before?
Does the cast even help at all.
Actually I also have these same questions
- why we haven't seen any instances other than this one?
- why its not seen on other indirect function calls?
Here is the the call stack of the failure that we got.
[ 323.288681][ T7] Kernel panic - not syncing: CFI failure (target: 0xffffffe5fc811f98) [ 323.288710][ T7] CPU: 6 PID: 7 Comm: kworker/u16:0 Tainted: G S W OE 5.15.41-android13-8-g5ffc5644bd20 #1 [ 323.288730][ T7] Workqueue: adb ffs_user_copy_worker.cfi_jt [ 323.288752][ T7] Call trace: [ 323.288755][ T7] dump_backtrace.cfi_jt+0x0/0x8 [ 323.288772][ T7] dump_stack_lvl+0x80/0xb8 [ 323.288785][ T7] panic+0x180/0x444 [ 323.288797][ T7] find_check_fn+0x0/0x218 [ 323.288810][ T7] ffs_user_copy_worker+0x1dc/0x204 [ 323.288822][ T7] kretprobe_trampoline.cfi_jt+0x0/0x8 [ 323.288837][ T7] worker_thread+0x3ec/0x920 [ 323.288850][ T7] kthread+0x168/0x1dc [ 323.288859][ T7] ret_from_fork+0x10/0x20 [ 323.288866][ T7] SMP: stopping secondary CPUs
And from address to line translation, we got know the issue is from ffs_user_copy_worker+0x1dc/0x204 || io_data->kiocb->ki_complete(io_data->kiocb, ret);
And "find_check_fn" was getting invoked from ki_complete. Only thing that I found suspicious about ki_complete() is its argument types. That's why I pushed this patch here, so that we can discuss this out here.
I think the problem is more likely whatever ->ki_complete() points to but I have no idea what that is on your system. You're using an Android kernel so it could be something out of tree as well...
Correct, CFI would *never* trigger a failure due to passing int as long. It triggers only on prototype-implementation mismatches. The author should go and check carefully whether there are any places where some implementation differs and then ::ki_complete() gets passed with a function typecast. Also, there can be places where a proto has an argument as enum and an implementation has it as int. Compilers don't warn on such mismatches, CFI does. The latest LLVM Git snapshot with `-Wcast-function-type-strict` enabled could help hunt such.
regards, dan carpenter
Thanks, Olek