Hi Francesco,
francesco@dolcini.it wrote on Fri, 2 Dec 2022 12:23:37 +0100:
- u-boot list
On Fri, Dec 02, 2022 at 11:53:27AM +0100, Miquel Raynal wrote:
francesco@dolcini.it wrote on Fri, 2 Dec 2022 11:24:29 +0100:
On Fri, Dec 02, 2022 at 11:12:43AM +0100, Francesco Dolcini wrote:
On Fri, Dec 02, 2022 at 10:14:18AM +0100, Miquel Raynal wrote:
francesco@dolcini.it wrote on Fri, 2 Dec 2022 08:19:00 +0100:
From: Francesco Dolcini francesco.dolcini@toradex.com
Add a fallback mechanism to handle the case in which #size-cells is set to <0>. According to the DT binding the nand controller node should have set it to 0 and this is not compatible with the legacy way of specifying partitions directly as child nodes of the nand-controller node.
I understand the problem, I understand the fix, but I have to say, I strongly dislike it :) Touching an mtd core driver to fix a single broken use case like that is... problematic, for the least.
I just noticed it 2 days after this patch was backported to a stable kernel, I am just the first one to notice, we are not talking about a single use case.
I am sorry but if a 6.0 kernel breaks because:
Not only kernel 6.0 is currently broken. This patch is going to be backported to any stable kernel given the fixes tag it has.
If you really want to workaround U-Boot, either you revert that patch or you just fix the DT description instead. The parent/child/partitions scheme has been enforced for maybe 5 years now and for a good reason: a NAND controller with partitions does not make _any_ sense. There are plenty of examples out there, imx7-colibri.dtsi has received many updates since its introduction (for the best), so why not this one?
I can and I will update imx7-colibri.dtsi (patch coming),
:thumb_up:
but is this good enough given the kind of boot failure regression this introduce? We are going to have old u-boot around that will not work with it, and the
Just another piece of information, support for the partitions node in U-Boot was added in version v2022.04 [1], we are not talking about ancient old legacy stuff.
If it is so recent, then this is what needs to be fixed, and it should not bother "many" people because 2022.04 is not so old.
So I am a bit lost, IIUC what is currently broken is:
- U-Boot > 2022.04 and any version of Linux with the backport?
If I add the partitions node as a child of my nand controller, as I was planning to do and I wrote 10 lines above, I will create a new flavor of non-booting system with U-Boot older than v2022.04 :-/
I think there is a little confusion here. You are referring to the NAND
I guess I have not explained myself well enough :-)
Ok, there is still a confusion. Even though I think your logic still applies, I want to emphasis on how wrong it is to define partitions in the NAND _controller_ node rather than the NAND _chip_ node. And I think this might have an impact on our final choice.
U-Boot is creating the partitions in the dtb, they are not defined in the source dts file (this is common practice with multiple boards).
That fdt_fixup_mtdparts() thing is a mistake. The original idea is:
1. Define wrong nodes in your DT 2. Fix your DT at run time in U-Boot 3. Provide the "fixed" DT to Linux
Now step #2 now produces wrong FDT. So what, we should darken even more the of partition driver in Linux to workaround it? At most what we can do is warn the user so that people don't loose time understanding what happens, but I am against supporting this, ever.
Before v2022.04 it was always updating the nand-controller node, starting from v2022.04 if there is a dedicated `partitions` node it uses it.
Sounds reasonable.
This is just the reverse of what ofpart_core.c is doing (if the partitions node is there it assumes the partitions should go into it, otherwise it proceeds with the legacy way).
Yes, that's how we handle legacy bindings.
Let's have a concrete example with colibri-imx7.
Current status:
- The nand-controller node does not include any partitions child, any U-Boot version will just add the partition directly as child of the nand controller. This is where I am hitting this boot regression now.
Not exactly. It worked until now because your original DT already included #size-cells = <1> I believe. It does not do that anymore and that is why you get your boot regression: because the DT was modified.
The reason why the DT got modified however is interesting. The commit log says the goal is to comply with modern bindings, which is great. But if you break how your board boots, then you should probably not do that. And if we really care about complying with the bindings, there is something much more interesting than fixing a single property: distinguishing the NAND controller vs. the NAND chip(s), which has been enforced since 2016 (which probably predates the imx7-colibri.dtsi, but whatever): 2d472aba15ff ("mtd: nand: document the NAND controller/NAND chip DT representation")
Potential change I envisioned here:
- I add the partitions node to the nand-controller, e.g.
--- a/arch/arm/boot/dts/imx7-colibri.dtsi +++ b/arch/arm/boot/dts/imx7-colibri.dtsi @@ -380,6 +380,12 @@ &gpmi { nand-on-flash-bbt; pinctrl-names = "default"; pinctrl-0 = <&pinctrl_gpmi_nand>;
partitions {
compatible = "fixed-partitions";
#address-cells = <1>;
#size-cells = <1>;
};
};
- U-Boot >= v2022.04 will just work fine creating the partitions as currently described in the bindings.
- U-Boot < v2022.04 will still create the partitions as child of the nand-controller node. Linux will see that a `partitions` node exists but it will be empty, leading to a boot failure in case mtd is used as boot device.
controller node, the commit refers to the NAND chip node. What this commit does looks fine because it just tries to use the partitions {} node rather than the NAND chip node and if the partitions {} node already exist, I expect #address-cells and #size-cells to be defined and be != 0 already.
yes, this commit is perfectly fine I agree.
The reality is that people is using newer kernel with older U-Boot, and I do not think that deliberately breaking this use case is what the Linux kernel should do.
Agreed.
I do not think that I can push a change in the DTS that will break booting any board using an older U-Boot.
That's however the initial cause of this discovery. A DT change broke your boot flow. I'm saying "your" boot flow because I am not sure it affects "any" board.
For now it only affects the imx7 colibri boards because of: 753395ea1e45 ("ARM: dts: imx7: Fix NAND controller size-cells")
But all these boards could be affected in the same way because of some machine code playing with fdt_fixup_mtdparts(): * arch/arm/mach-uniphier/fdt-fixup.c * board/compulab/cm_fx6/cm_fx6.c * board/gateworks/gw_ventana/gw_ventana.c * board/isee/igep003x/board.c * board/isee/igep00x0/igep00x0.c * board/phytec/phycore_am335x_r2/board.c * board/st/stm32mp1/stm32mp1.c * board/toradex/colibri-imx6ull/colibri-imx6ull.c * board/toradex/colibri_imx7/colibri_imx7.c * board/toradex/colibri_vf/colibri_vf.c That's of course way too much possible failures.
I still strongly disagree with the initial proposal but what I think we can do is:
1. To prevent future breakages: Fix fdt_fixup_mtdparts() in u-boot. This way newer U-Boot + any kernel should work.
2. To help tracking down situations like that: Keep the warning in ofpart.c but continue to fail.
3. To fix the current situation: Immediately revert commit (and prevent it from being backported): 753395ea1e45 ("ARM: dts: imx7: Fix NAND controller size-cells") This way your own boot flow is fixed in the short term.
4. There is no reason to partially fix a DT like what the above did besides trying to avoid warnings emitted by the DT check tools. If complying with modern bindings is a goal (and I think it should be), then we can modernize this DT without breaking the boot flow: Instead of only setting #size-cell = <0>, you can as well define in your DT a subnode to define the NAND chip. NAND chips are not supposed to have #size-cells properties, but in the past they did, which means #address-cells and #size-cells are allowed (and marked deprecated in the schema). So in practice, the dt-schema will not warn you if they are there, which means you can still set #size-cell = <1>.
Please mind, the tools have been updated very recently to match what I am describing above, so they will likely still report errors until v6.2-rc1, see: https://lore.kernel.org/linux-mtd/20221114090315.848208-1-miquel.raynal@boot...
Does this sound reasonable?
Thanks, Miquèl