Hello!
Following to the initial discussion https://lore.kernel.org/all/20220701110341.3094023-1-s.hauer@pengutronix.de which caused the revert commit: Are there any plans to fix this issue for 5.10.y (and maybe other stable branches)?
Thanks in advance!
On Thu, Jun 22, 2023 at 6:46 AM Kegl Rohit keglrohit@gmail.com wrote:
After reverting the revert :), the data corruption did not happen anymore!
https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git/commi...
On Wed, Jun 21, 2023 at 7:55 PM Kegl Rohit keglrohit@gmail.com wrote:
ok, looking at the 5.10.184 gpmi-nand.c:
#define BF_GPMI_TIMING1_BUSY_TIMEOUT(v) \ (((v) << BP_GPMI_TIMING1_BUSY_TIMEOUT) & BM_GPMI_TIMING1_BUSY_TIMEOUT)
hw->timing1 = BF_GPMI_TIMING1_BUSY_TIMEOUT(busy_timeout_cycles * 4096);
and then 5.19 (upstream patch source 0fddf9ad06fd9f439f137139861556671673e31c) https://github.com/gregkh/linux/commit/0fddf9ad06fd9f439f137139861556671673e...
hw->timing1 = BF_GPMI_TIMING1_BUSY_TIMEOUT(DIV_ROUND_UP(busy_timeout_cycles, 4096));
could be the cause. DIV_ROUND_UP is most likely a division and busy_timeout_cycles * 4096 a multiplication!
The backport is wrong, because on the 5.10 kernel tree commit cc5ee0e0eed0bec2b7cc1d0feb9405e884eace7d was reverted and on mainline not. https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git/commi...
=> now in 5.10.184 this line "hw->timing1 ..." is wrong!
I will test this tomorrow.
On Wed, Jun 21, 2023 at 5:26 PM han.xu han.xu@nxp.com wrote:
On 23/06/21 04:27PM, Kegl Rohit wrote:
Hello!
Using imx7d and rt stable kernel tree.
After upgrading to v5.10.184-rt90 the rootfs ubifs mtd partition got corrupted. https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git/tag/?...
After reverting the latest patch (e4e4b24b42e710db058cc2a79a7cf16bf02b4915), the rootfs partition did not get corrupted. https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git/commi...
The commit message states the timeout calculation was changed. Here are the calculated timeouts `busy_timeout_cycles` before (_old) and after the patch (_new):
[ 0.491534] busy_timeout_cycles_old 4353 [ 0.491604] busy_timeout_cycles_new 1424705 [ 0.492300] nand: device found, Manufacturer ID: 0xc2, Chip ID: 0xdc [ 0.492310] nand: Macronix MX30LF4G28AC [ 0.492316] nand: 512 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB size: 112 [ 0.492488] busy_timeout_cycles_old 4353 [ 0.492493] busy_timeout_cycles_new 1424705 [ 0.492863] busy_timeout_cycles_old 2510 [ 0.492872] busy_timeout_cycles_new 350000
The new timeouts are set a lot higher. Higher timeouts should not be an issue. Lower timeouts could be an issue. But because of this high timeouts gpmi-nand is broken for us.
For now we simple reverted the change. The new calculations seem to be flaky, a previous "fix backport" was already reverted because of data corruption. https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git/commi...
Any guesses why the high timeout causes issues?
high timeout with wrong calculation may overflow and causes DEVICE_BUSY_TIMEOUT register turns to be 0.
Thanks in advance!
Linux MTD discussion mailing list http://lists.infradead.org/mailman/listinfo/linux-mtd/