Hi Marek,
marex@denx.de wrote on Thu, 15 Dec 2022 08:45:33 +0100:
On 12/15/22 08:16, Miquel Raynal wrote:
Hi Marek & Francesco,
Hi,
marex@denx.de wrote on Mon, 5 Dec 2022 17:25:11 +0100:
On 12/5/22 14:49, Miquel Raynal wrote:
Hi Francesco,
Hi,
francesco@dolcini.it wrote on Mon, 5 Dec 2022 12:26:44 +0100:
>> On Fri, Dec 02, 2022 at 06:08:22PM +0100, Marek Vasut wrote: But here I would say this is a firmware bug and it might have to be handled like a firmware bug, i.e. with fixup in the partition parser. I seem to be changing my opinion here again.
I was thinking at this over the weekend, and I came to the following ideas:
- we need some improvement on the fixup we already have in the partition parser. We cannot ignore the fdt produced by U-Boot - as bad as it is.
- the proposed fixup is fine for the immediate need, but it is not going to be enough to cover the general issue with the U-Boot generated partitions. U-Boot might keep generating partitions as direct child of the nand controller even when a partitions{} node is available. In this case the current parser just fails since it looks only into it and it will find it empty.
- the current U-Boot only handle partitions{} as a direct child of the nand-controller, the nand-chip is ignored. This is not the way it is supposed to work. U-Boot code would need to be improved.
I've been thinking about it this weekend as well and the current fix which "just set" s_cell to 1 seems risky for me, it is typically the type of quick & dirty fix that might even break other board (nobody knew that U-Boot current logic expected #size-cells to be set in the DT, what if another "broken" DT expects the opposite...)
Then with the current configuration, such broken DT would not work, since current DT does set #size-cells=<1> (wrongly).
, not mentioning potential issues with big storages (> 4GiB).
All in all, I really think we should revert the DT change now, reverting as little to no drawbacks besides a dt_binding_check warning and gives us time to deal with it properly (both in U-Boot and Linux).
I am really not happy with this, but if that's marked as intermediate fix, go for it.
How do we deal with this in the long run however? Parser-side fix like this one, maybe with better heuristics ?
Yesterday while talking about an ACPI mis-description which needed fixing, I realized fixing up what the firmware provides to Linux should preferably be handled as early as possible. So my first first idea was to avoid using the broken "fixup mtdparts" function in U-Boot and I am still convinced this is what we should do in priority. However, as rightly pointed in this thread, we need to take care about the case where someone would use a newer DT (let's say, with the reverted changed reverted again) with an old U-Boot. I am still against piggy hacks in the generic ofpart.c driver, but what we could do however is a DT fixup in the init_machine (or the dt_fixup) hook for imx7 Colibri, very much like this: https://elixir.bootlin.com/linux/latest/source/arch/arm/mach-mvebu/board-v7.... Plus a warning there saying "your dt is broken, update your firmware".
This does not work, because the old U-Boot fixup_mtdparts() may be applied on any machine,
No: https://elixir.bootlin.com/u-boot/latest/A/ident/fdt_fixup_mtdparts And we should make our best so its use does not proliferate. It's not like there is half a dozen of good ways to describe and forward partitions today.
it is not colibri mx7 specific. Also, new arch-side workaround are really not welcome by the architecture maintainers as far as I can tell.
So what? Let's propose the change and see what the maintainers have to say. I am open to discussion.
As I said, it is not colibri mx7 specific, there are a few boards which might be affected, they are all clearly identifiable with a compatible. It's not the entire planet either.
So next time someone stumbles upon this issue, we can tell them "fix your bootloader", and apply the same hack in their board family (there are three or four IIRC which might be concerned some day).
There are also those machines we do not even know about which might be generating bogus DT using old U-Boot and fixup_mtdparts(), so, unless there is some all-arch fixup implementation, we wouldn't be able to fix them all on arch side. I think the all-arch fixup implementation would be the driver one, i.e. this patch as it is (or maybe with some improvement).
If we don't know about them, as you say, I don't feel concerned.
If something is buggy, people will report it, we will point them in the right direction so they can fix their firmware and propose a similar fix in their case which will involve adding a new machine compatible to the list of boards that should tweak the #size-cell property.
That would fix all cases and only have an impact on the affected boards.
Sadly, it does only fix the known cases, not the unknown cases like downstream forks which never get any bootloader updates ever, and which you can't find in upstream U-Boot, and which you therefore cannot easily catch in the arch side fixup.
And ?
Thanks, Miquèl