Hello!
Following to the initial discussion
https://lore.kernel.org/all/20220701110341.3094023-1-s.hauer@pengutronix.de
which caused the revert commit:
Are there any plans to fix this issue for 5.10.y (and maybe other
stable branches)?
Thanks in advance!
On Thu, Jun 22, 2023 at 6:46 AM Kegl Rohit <keglrohit(a)gmail.com> wrote:
>
> After reverting the revert :), the data corruption did not happen anymore!
>
> https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git/comm…
>
> On Wed, Jun 21, 2023 at 7:55 PM Kegl Rohit <keglrohit(a)gmail.com> wrote:
> >
> > ok, looking at the 5.10.184 gpmi-nand.c:
> >
> > #define BF_GPMI_TIMING1_BUSY_TIMEOUT(v) \
> > (((v) << BP_GPMI_TIMING1_BUSY_TIMEOUT) & BM_GPMI_TIMING1_BUSY_TIMEOUT)
> >
> > hw->timing1 = BF_GPMI_TIMING1_BUSY_TIMEOUT(busy_timeout_cycles * 4096);
> >
> > and then 5.19 (upstream patch source 0fddf9ad06fd9f439f137139861556671673e31c)
> > https://github.com/gregkh/linux/commit/0fddf9ad06fd9f439f137139861556671673…
> >
> > hw->timing1 = BF_GPMI_TIMING1_BUSY_TIMEOUT(DIV_ROUND_UP(busy_timeout_cycles,
> > 4096));
> >
> > could be the cause. DIV_ROUND_UP is most likely a division and
> > busy_timeout_cycles * 4096 a multiplication!
> >
> > The backport is wrong, because on the 5.10 kernel tree commit
> > cc5ee0e0eed0bec2b7cc1d0feb9405e884eace7d was reverted and on mainline
> > not.
> > https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git/comm…
> >
> > => now in 5.10.184 this line "hw->timing1 ..." is wrong!
> >
> > I will test this tomorrow.
> >
> > On Wed, Jun 21, 2023 at 5:26 PM han.xu <han.xu(a)nxp.com> wrote:
> > >
> > > On 23/06/21 04:27PM, Kegl Rohit wrote:
> > > > Hello!
> > > >
> > > > Using imx7d and rt stable kernel tree.
> > > >
> > > > After upgrading to v5.10.184-rt90 the rootfs ubifs mtd partition got corrupted.
> > > > https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git/tag/…
> > > >
> > > > After reverting the latest patch
> > > > (e4e4b24b42e710db058cc2a79a7cf16bf02b4915), the rootfs partition did
> > > > not get corrupted.
> > > > https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git/comm…
> > > >
> > > > The commit message states the timeout calculation was changed.
> > > > Here are the calculated timeouts `busy_timeout_cycles` before (_old)
> > > > and after the patch (_new):
> > > >
> > > > [ 0.491534] busy_timeout_cycles_old 4353
> > > > [ 0.491604] busy_timeout_cycles_new 1424705
> > > > [ 0.492300] nand: device found, Manufacturer ID: 0xc2, Chip ID: 0xdc
> > > > [ 0.492310] nand: Macronix MX30LF4G28AC
> > > > [ 0.492316] nand: 512 MiB, SLC, erase size: 128 KiB, page size:
> > > > 2048, OOB size: 112
> > > > [ 0.492488] busy_timeout_cycles_old 4353
> > > > [ 0.492493] busy_timeout_cycles_new 1424705
> > > > [ 0.492863] busy_timeout_cycles_old 2510
> > > > [ 0.492872] busy_timeout_cycles_new 350000
> > > >
> > > > The new timeouts are set a lot higher. Higher timeouts should not be
> > > > an issue. Lower timeouts could be an issue.
> > > > But because of this high timeouts gpmi-nand is broken for us.
> > > >
> > > > For now we simple reverted the change.
> > > > The new calculations seem to be flaky, a previous "fix backport" was
> > > > already reverted because of data corruption.
> > > > https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git/comm…
> > > >
> > > > Any guesses why the high timeout causes issues?
> > >
> > > high timeout with wrong calculation may overflow and causes DEVICE_BUSY_TIMEOUT
> > > register turns to be 0.
> > >
> > > >
> > > >
> > > > Thanks in advance!
> > > >
> > > > ______________________________________________________
> > > > Linux MTD discussion mailing list
> > > > http://lists.infradead.org/mailman/listinfo/linux-mtd/
The following commit has been merged into the x86/urgent branch of tip:
Commit-ID: d082d48737c75d2b3cc1f972b8c8674c25131534
Gitweb: https://git.kernel.org/tip/d082d48737c75d2b3cc1f972b8c8674c25131534
Author: Lee Jones <lee(a)kernel.org>
AuthorDate: Wed, 14 Jun 2023 17:38:54 +01:00
Committer: Dave Hansen <dave.hansen(a)linux.intel.com>
CommitterDate: Fri, 16 Jun 2023 11:46:42 -07:00
x86/mm: Avoid using set_pgd() outside of real PGD pages
KPTI keeps around two PGDs: one for userspace and another for the
kernel. Among other things, set_pgd() contains infrastructure to
ensure that updates to the kernel PGD are reflected in the user PGD
as well.
One side-effect of this is that set_pgd() expects to be passed whole
pages. Unfortunately, init_trampoline_kaslr() passes in a single entry:
'trampoline_pgd_entry'.
When KPTI is on, set_pgd() will update 'trampoline_pgd_entry' (an
8-Byte globally stored [.bss] variable) and will then proceed to
replicate that value into the non-existent neighboring user page
(located +4k away), leading to the corruption of other global [.bss]
stored variables.
Fix it by directly assigning 'trampoline_pgd_entry' and avoiding
set_pgd().
[ dhansen: tweak subject and changelog ]
Fixes: 0925dda5962e ("x86/mm/KASLR: Use only one PUD entry for real mode trampoline")
Suggested-by: Dave Hansen <dave.hansen(a)linux.intel.com>
Signed-off-by: Lee Jones <lee(a)kernel.org>
Signed-off-by: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: <stable(a)vger.kernel.org>
Link: https://lore.kernel.org/all/20230614163859.924309-1-lee@kernel.org/g
---
arch/x86/mm/kaslr.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/arch/x86/mm/kaslr.c b/arch/x86/mm/kaslr.c
index 557f0fe..37db264 100644
--- a/arch/x86/mm/kaslr.c
+++ b/arch/x86/mm/kaslr.c
@@ -172,10 +172,10 @@ void __meminit init_trampoline_kaslr(void)
set_p4d(p4d_tramp,
__p4d(_KERNPG_TABLE | __pa(pud_page_tramp)));
- set_pgd(&trampoline_pgd_entry,
- __pgd(_KERNPG_TABLE | __pa(p4d_page_tramp)));
+ trampoline_pgd_entry =
+ __pgd(_KERNPG_TABLE | __pa(p4d_page_tramp));
} else {
- set_pgd(&trampoline_pgd_entry,
- __pgd(_KERNPG_TABLE | __pa(pud_page_tramp)));
+ trampoline_pgd_entry =
+ __pgd(_KERNPG_TABLE | __pa(pud_page_tramp));
}
}