Il 24 marzo 2024 19:51:19 CET, Avri Altman Avri.Altman@wdc.com ha scritto:
Hello Mikko and Avri,
On Wed, Mar 13, 2024 at 03:37:44PM +0200, mikko.rapeli@linaro.org wrote:
From: Mikko Rapeli mikko.rapeli@linaro.org
Commit "mmc: core: Use mrq.sbc in close-ended ffu" assigns prev_idata = idatas[i - 1] but doesn't check that int iterator i is greater than zero. Add the check.
Fixes: 4d0c8d0aef63 ("mmc: core: Use mrq.sbc in close-ended ffu")
Link: https://lore.kernel.org/all/20231129092535.3278-1-avri.altman@wdc.com/
Cc: Avri Altman avri.altman@wdc.com Cc: Ulf Hansson ulf.hansson@linaro.org Cc: Adrian Hunter adrian.hunter@intel.com Cc: linux-mmc@vger.kernel.org Cc: stable@vger.kernel.org Signed-off-by: Mikko Rapeli mikko.rapeli@linaro.org
I just had the following Oops
[ 31.377291] Unable to handle kernel paging request at virtual address 0000fffffc386a14 [ 31.385348] Mem abort info: [ 31.388136] ESR = 0x0000000096000006 [ 31.392338] EC = 0x25: DABT (current EL), IL = 32 bits [ 31.397681] SET = 0, FnV = 0 [ 31.400730] EA = 0, S1PTW = 0 [ 31.405397] FSC = 0x06: level 2 translation fault [ 31.410355] Data abort info: [ 31.413245] ISV = 0, ISS = 0x00000006 [ 31.417086] CM = 0, WnR = 0 [ 31.420049] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000084f89000 [ 31.426552] [0000fffffc386a14] pgd=0800000084af2003, p4d=0800000084af2003, pud=0800000083ec0003, pmd=0000000000000000 [ 31.437393] Internal error: Oops: 0000000096000006 [#1] PREEMPT SMP [ 31.443657] Modules linked in: crct10dif_ce ti_k3_r5_remoteproc virtio_rpmsg_bus rpmsg_ns rtc_ti_k3 ti_k3_m4_remoteproc ti_k3_common tidss drm_dma_helper mcrc sa2ul lontium_lt8912b tc358768 display_connector drm_kms_helper ina2xx syscopyarea sysfillrect sysimgblt fb_sys_fops spi_omap2_mcspi pwm_tiehrpwm drm lm75 drm_panel_orientation_quirks optee_rng rng_core [ 31.475530] CPU: 0 PID: 8 Comm: kworker/0:0H Not tainted 6.1.80+git.ba628d222cde #1 [ 31.483179] Hardware name: Toradex Verdin AM62 on Verdin Development Board (DT) [ 31.490480] Workqueue: kblockd blk_mq_run_work_fn [ 31.495216] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=-- ) [ 31.502172] pc : __mmc_blk_ioctl_cmd+0x12c/0x590 [ 31.506795] lr : __mmc_blk_ioctl_cmd+0x2cc/0x590 [ 31.511408] sp : ffff8000092a39e0 [ 31.514717] x29: ffff8000092a3b50 x28: ffff8000092a3d28 x27: 0000000000000000 [ 31.521853] x26: ffff80000a5a3cf0 x25: ffff000018bbb400 x24: 0000fffffc386a08 [ 31.528989] x23: ffff000018a8b808 x22: 0000000000000000 x21: 00000000ffffffff [ 31.536124] x20: ffff000018a8b800 x19: ffff0000048c6680 x18: 0000000000000000 [ 31.543260] x17: 0000000000000000 x16: 0000000000000000 x15: 0000146d78b52ba4 [ 31.550394] x14: 0000000000000206 x13: 0000000000000001 x12: 0000000000000000 [ 31.557529] x11: 0000000000000000 x10: 00000000000009b0 x9 : 0000000000000651 [ 31.564664] x8 : ffff8000092a3ad8 x7 : 0000000000000000 x6 : 0000000000000000 [ 31.571800] x5 : 0000000000000200 x4 : 0000000000000000 x3 : 00000000000003e8 [ 31.578935] x2 : 0000000000000000 x1 : 000000000000001d x0 : 0000000000000017 [ 31.586071] Call trace: [ 31.588513] __mmc_blk_ioctl_cmd+0x12c/0x590 [ 31.592782] mmc_blk_mq_issue_rq+0x50c/0x920 [ 31.597049] mmc_mq_queue_rq+0x118/0x2ac [ 31.600970] blk_mq_dispatch_rq_list+0x1a8/0x8b0 [ 31.605588] __blk_mq_sched_dispatch_requests+0xb8/0x164 [ 31.610898] blk_mq_sched_dispatch_requests+0x3c/0x80 [ 31.615946] __blk_mq_run_hw_queue+0x68/0xa0 [ 31.620215] blk_mq_run_work_fn+0x20/0x30 [ 31.624223] process_one_work+0x1d0/0x320 [ 31.628238] worker_thread+0x14c/0x444 [ 31.631989] kthread+0x10c/0x110 [ 31.635219] ret_from_fork+0x10/0x20 [ 31.638801] Code: 12010000 2a010000 b90137e0 b4000078 (b9400f00) [ 31.644888] ---[ end trace 0000000000000000 ]---
From a quick look I assume that this is the exact same issue you are fixing here, correct?
Probably. Did you applied the patch and the issue persists?
It's not systematic, probably it depends on what is in the memory at this negative indexed array.
I would try to confirm is the exact same issue tomorrow. Worth mentioning that on our system this leads to some pretty bad failures (data corruption).
Considering that the buggy commit was back ported to LTS kernel (6.1, in my case), we should have this into mainline as soon as possible, so that the fix can also be backported.
Francesco