Is there really a need to merge the compatible strings in your case?
Well, you could just require duplicating an overlay N times for N bases, but that doesn't scale
I think my question was more along the lines of "is there actually a reason why the combined compatible string needs to be that accurate" (i.e. contain the right "socvendor,mysoc-rev1" or "socvendor,mysoc-rev2"). What is the purpose of having that string in there? It wouldn't be used for matching which DTB(O) to load in this case because we're already past that step. It's not used by the kernel directly for anything as far as I know. Is it used by userspace programs which parse /proc/device-tree/compatible in order to detect which SoC revision they're running on? Because that's the only use case I could think of that really remains, but I'd argue that the compatible string is a pretty poor vehicle for things like that (it would be easier to parse and use to just put a `soc-revision = <1>` property somewhere). So if this is the only problem I'd say maybe just don't use the board compatible string for that, don't expect to be able to find such details in there accurately. But if we do think it needs to be that accurate for some reason, then I'd suggest platforms like this (which should be rather few, e.g. only those that really have a socketable SoC with differentiations that are so compatible that other than this identification information itself it doesn't require any device tree differences) should simply have their bootloaders rewrite the compatible string into the right format with all the necessary information manually, rather than expect the overlay application process to create it correctly.
There is a proposal here[1]. It's simple, but I do wonder if looking at the root compatible only is too narrow of a view. An overlay could target a connector compatible for example.
This seems to just assume that every overlay matches exactly one base tree, that kinda defeats the purpose of overlays in this case (sharing data between multiple base trees). For our purposes we need a much more complicated system that is able to stitch together an arbitrary number of overlays based on identifiers specific to our platform.
However, it isn't that simple. For example, when the Hummingboard2 is used with the iMX6Q SoC, there's a SATA device present in the SoC level that needs Hummingboard2 specific properties to tune the signal waveform. However, iMX6DL doesn't have this SATA device in silicon, so the node doesn't exist in the base SoC DT file. The situation is the same for Hummingboard, but the tuning parameters, being board specific, are different.
I think you would solve this simply by having more overlays? In that situation you can have a imx6q.dtbo, imx6dl.dtbo and hummingboard2.dtbo for the generic nodes and properties relating to each of these components, and then a imx6q-hummingboard2.dtbo specifically for the SATA tuning parameters of that SoC+board combination. Your bootloader then just needs to figure out which of those to load for which platform. (Of course there also has to be a toplevel DTB, so if you don't have any further revision or SKU differentiation above that then imx6q-hummingboard2.dtb could simply be your toplevel DTB, containing those tuning parameters, and the rest could be overlays. imx6dl-hummingboard2.dtb would then be an empty toplevel DTB (save for the compatible string) if everything else that makes up the platform gets provided by the overlays.)
The other issue would be the /model property - for example:
model = "SolidRun HummingBoard2 Solo/DualLite"; model = "SolidRun HummingBoard2 Solo/DualLite (1.5som+emmc)"; model = "SolidRun HummingBoard2 Solo/DualLite (1.5som)"; model = "SolidRun HummingBoard Solo/DualLite"; model = "SolidRun HummingBoard2 Dual/Quad"; model = "SolidRun Cubox-i Solo/DualLite";
These can also go in the toplevel DTB. Basically, the toplevel DTB can always refer to the most specific cross product of components and contain properties like these that only make sense for the specific combination. There will be a lot of toplevel DTB files, but they will be for the most part very small. The "meat" of the device tree is factored out into overlays that can be shared by multiple toplevel DTBs. The benefit is smaller image size for platforms that need to bundle DTBs for all possible variations in their kernel image (and whether that's worth it or not is up to each platform, we're not asking to make these overlays mandatory, we're just saying that we have platforms that do need (and are in part already using) something like this in order to deal with the sheer scale of supported platform variations, and we'd like there to be an upstream standard for it).
I also don't see a sensible way without something like the above for a boot loader to know the filenames of each of the components for a platform - and it would need to be told the order to glue those components together.
I think platform identification and matching which overlays to apply should remain outside the scope of the device tree itself. The main reason is that you usually want to compress DTB(O) files for efficiency, and when you have a large list of overlays in a kernel image you don't want to have to decompress them all just to determine which ones to load for the current platform. So I think it's generally better to let bootloaders come up with their own scheme to store this information inside their file systems or whatever other data structures they use to find these DTB(O) files.