Hi Kegl,
keglrohit@gmail.com wrote on Sun, 25 Jun 2023 11:11:52 +0200:
Hello!
Following to the initial discussion https://lore.kernel.org/all/20220701110341.3094023-1-s.hauer@pengutronix.de which caused the revert commit: Are there any plans to fix this issue for 5.10.y (and maybe other stable branches)?
If the fixes tags are right, all relevant branches which are still maintained should see the final fix applied. If that's not the case, it means the stable maintainers could not apply the patch as-is and let it aside. You are pleased in this case to adapt the official patch to the branch(es) of interest and send it to the stable team by mentioning the upstream commit (see the documentation about how to ask for backporting patches on stable branches).
Thanks, Miquèl
Thanks in advance!
On Thu, Jun 22, 2023 at 6:46 AM Kegl Rohit keglrohit@gmail.com wrote:
After reverting the revert :), the data corruption did not happen anymore!
https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git/commi...
On Wed, Jun 21, 2023 at 7:55 PM Kegl Rohit keglrohit@gmail.com wrote:
ok, looking at the 5.10.184 gpmi-nand.c:
#define BF_GPMI_TIMING1_BUSY_TIMEOUT(v) \ (((v) << BP_GPMI_TIMING1_BUSY_TIMEOUT) & BM_GPMI_TIMING1_BUSY_TIMEOUT)
hw->timing1 = BF_GPMI_TIMING1_BUSY_TIMEOUT(busy_timeout_cycles * 4096);
and then 5.19 (upstream patch source 0fddf9ad06fd9f439f137139861556671673e31c) https://github.com/gregkh/linux/commit/0fddf9ad06fd9f439f137139861556671673e...
hw->timing1 = BF_GPMI_TIMING1_BUSY_TIMEOUT(DIV_ROUND_UP(busy_timeout_cycles, 4096));
could be the cause. DIV_ROUND_UP is most likely a division and busy_timeout_cycles * 4096 a multiplication!
The backport is wrong, because on the 5.10 kernel tree commit cc5ee0e0eed0bec2b7cc1d0feb9405e884eace7d was reverted and on mainline not. https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git/commi... => now in 5.10.184 this line "hw->timing1 ..." is wrong!
I will test this tomorrow.
On Wed, Jun 21, 2023 at 5:26 PM han.xu han.xu@nxp.com wrote:
On 23/06/21 04:27PM, Kegl Rohit wrote:
Hello!
Using imx7d and rt stable kernel tree.
After upgrading to v5.10.184-rt90 the rootfs ubifs mtd partition got corrupted. https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git/tag/?...
After reverting the latest patch (e4e4b24b42e710db058cc2a79a7cf16bf02b4915), the rootfs partition did not get corrupted. https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git/commi...
The commit message states the timeout calculation was changed. Here are the calculated timeouts `busy_timeout_cycles` before (_old) and after the patch (_new):
[ 0.491534] busy_timeout_cycles_old 4353 [ 0.491604] busy_timeout_cycles_new 1424705 [ 0.492300] nand: device found, Manufacturer ID: 0xc2, Chip ID: 0xdc [ 0.492310] nand: Macronix MX30LF4G28AC [ 0.492316] nand: 512 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB size: 112 [ 0.492488] busy_timeout_cycles_old 4353 [ 0.492493] busy_timeout_cycles_new 1424705 [ 0.492863] busy_timeout_cycles_old 2510 [ 0.492872] busy_timeout_cycles_new 350000
The new timeouts are set a lot higher. Higher timeouts should not be an issue. Lower timeouts could be an issue. But because of this high timeouts gpmi-nand is broken for us.
For now we simple reverted the change. The new calculations seem to be flaky, a previous "fix backport" was already reverted because of data corruption. https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git/commi...
Any guesses why the high timeout causes issues?
high timeout with wrong calculation may overflow and causes DEVICE_BUSY_TIMEOUT register turns to be 0.
Thanks in advance!
Linux MTD discussion mailing list http://lists.infradead.org/mailman/listinfo/linux-mtd/