Hi Francesco,
francesco@dolcini.it wrote on Fri, 16 Dec 2022 13:37:31 +0100:
On Fri, Dec 16, 2022 at 12:01:55PM +0100, Miquel Raynal wrote:
marex@denx.de wrote on Fri, 16 Dec 2022 11:46:18 +0100:
On 12/16/22 08:45, Francesco Dolcini wrote:
On Thu, Dec 15, 2022 at 08:16:04AM +0100, Miquel Raynal wrote:
I am still against piggy hacks in the generic ofpart.c driver, but what we could do however is a DT fixup in the init_machine (or the dt_fixup) hook for imx7 Colibri, very much like this: https://elixir.bootlin.com/linux/latest/source/arch/arm/mach-mvebu/board-v7.... Plus a warning there saying "your dt is broken, update your firmware".
I have a couple of concerns/question with this approach:
- do we have a single point to handle this? Different architectures are affected by these issue. Duplicating the fixup code in multiple place does not seems a great idea
- If we believe that the device tree is wrong, in the i.MX7 case because of #size-cells should be set to 0 and not 1, we should not alter the FDT. Other part of the code could rely on this being correctly set to 0 moving forward.
If I understood you are proposing to have a fixup at the machine level that is converting a valid nand-controller node definition to a "broken" one. Unless I misunderstood you and you are thinking about rewriting the whole MTD partition from a broken definition to a proper one.
No, quite the opposite.
Either size-cell is wrong which makes the description totally inconsistent (if size-cell is there, it must have a use, otherwise why do we keep it?) and we must fix it, or it is right and we should not touch it.
What I propose is to check very early whether the description is consistent on the board known to have this problem. If the description is wrong, we fix it and the generic parser can then do its work properly.
What if we add `nand-chip{}` children in the future (the i.MX nand controller has nothing implemented not described in the schema so far, but it is something that is supported by the hw)? Will this idea still works?
I think yes. I mean, moving to a
nand-controller { nand-chip { partitions { part@x part@y } } }
scheme is what we should eventually find on all maintained boards, but I would say, at the very least, the description must be coherent.
But my previous answer was only focusing on the case where you change something in the kernel or in the DT that breaks the board because of the mess fdt_fixup_mtdparts() brings.
On Thu, Dec 15, 2022 at 09:04:46AM +0100, Miquel Raynal wrote:
marex@denx.de wrote on Thu, 15 Dec 2022 08:45:33 +0100:
Sadly, it does only fix the known cases, not the unknown cases like downstream forks which never get any bootloader updates ever, and which you can't find in upstream U-Boot, and which you therefore cannot easily catch in the arch side fixup.
And ?
I'm not personally and directly concerned, since the machine I care are all available upstream and known, however this is a general problem with U-Boot code being at the same time widely used on a range of embedded products and producing a broken MTD partition list.
I think we will just silently break boards and just creating a lot of issues to people. We would just introduce regression to the users, being aware of it and deliberately decide to not care and move the problem to someone else. I do not think this is a good way to go.
What?
Let me rephrase, I was not clear enough.
Since when my proposal is breaking boards? My proposal leads to a situation where:
- If you have a board that has an inconsistent description but worked, it will still work.
- If you have a board that has a consistent description and worked, it will still work.
- If your have a board that has an inconsistent description and got broken *recently* by another change (typically you "fix" the DT in Linux to comply with the bindings), then you get a warning that leads you on the right path, you then update your bootloader if you can, but either way you add your machine compatible to the list of devices which need the early fix and your boot is fixed.
This implies that we can proactively catch all the affected boards. I do not believe this is reasonable and because of that my comment before about creating regression to the users.
I really don't understand the reasoning here.
What I say is: let's fix the boards known to be incorrectly described when we break them so they continue working with a broken firmware.
What regression could this possibly bring? I don't care about catching the 2k boards out there which work but wrongly describe their partitions. If they work, they will continue working.
You and Marek say: let's blindly always change a property in the DT, no matter if the board is broken, even if we don't know if this is the right thing to do, and apply this to the entire world.
But with this approach you're not worried about regressions.
I am sorry it does not stand.
Thanks, Miquèl