On Mon, 8 Dec 2025, Christian Marangi wrote:
On Mon, Dec 08, 2025 at 03:29:01PM +0200, Ilpo Järvinen wrote:
On Sun, 7 Dec 2025, Christian Marangi wrote:
Commit 900730dc4705 ("wifi: ath: Use of_reserved_mem_region_to_resource() for "memory-region"") uncovered a massive problem with the usage of resource_size() helper.
The reported commit caused a regression with ath11k WiFi firmware loading and the change was just a simple replacement of duplicate code with a new helper of_reserved_mem_region_to_resource().
On reworking this, in the commit also a check for the presence of the node was replaced with resource_size(&res). This was done following the logic that if the node wasn't present then it's expected that also the resource_size is zero, mimicking the same if-else logic.
So what's the actual resource that causes this ath11 regression? You seem possess that knowledge as you knew to create this patch.
I'm not gatekeeping the information. Just trying to get involved as much people as possible in checking this.
This helper is used in generic OF and PCI code and since there is the combo 2008 code + no comments, then it's sensible to make sure no confusion was applied across the year.
of_reserved_mem_region_to_resource() does use resource_set_range() so how did the end address become 0? Is there perhaps some other bug being covered up with this patch to resource_size()?
This IS EXACTLY the problem HERE. There is the assumption that resource_size() MUST be used ALWAYS after a resource struct is used.
But this is not alywas the case... nothing stops to call resource_size() on a a local
struct resource res = {}
and assume resource_size() returning 0 (as struct resource is all zero and assuming logically the size to be zero) (this assumption has been already proved to be wrong by the previous message but it's very common to assume that an helper called on an all zero struct returns 0 if the subject is about size or range of address)
I guess you might have written this before reading the entire email as the WARN_ON_ONCE() I suggested would have caught the caller side bug(s).
The ath11k assumes res was setup by of_reserved_mem_region_to_resource() that appearently wasn't the case (a logic bug introduced in the commit 900730dc4705 ("wifi: ath: Use of_reserved_mem_region_to_resource() for "memory-region""), and then calls resource_size() without first making sure the resource is valid.
There are at least two, if not three, problems on the caller side here.
This was also the reason the regression was mostly hard to catch at first sight as the rework is correctly done given the assumption on the used helpers.
BUT this is actually not the case. On further inspection on resource_size() it was found that it NEVER actually returns 0.
Even if the resource value of start and end are 0, the return value of resource_size() will ALWAYS be 1, resulting in the broken if-else condition ALWAYS going in the first if condition.
This was simply confirmed by reading the resource_size() logic:
return res->end - res->start + 1;
???
resource_size() does return 0 when ->start = 0 and ->end = - 1 which is the correctly setup of a zero-sized resource (when flags are non-zero).
Again IF the resource struct is correctly init, in the case of kmalloc-ed stuff or local variables init to zero, then you quickly trigger the bug as both start and end will be zero.
Also I honestly found some difficulties finding where end is set to -1.
git grep -e '->end = .* - 1'
A more complex pattern or Coccinelle can find more for you if you want to find multiline or other more complex constructs.
Those mostly predate the init helpers (I did have a series to convert them to resource_set_range() but DEFINE_RES() got introduced in the meantime too so I'd have needed to adapt many of the changes so I left the series to rotten as it was always a cleanup side-project to me and I've more on my plate than I've time for currently even in the main resource assignment algorithm).
I actually found case in OF code where both start and end are set to -1 resulting.
Can you point me where is this init code?
As noted below, DEFINE_RES*(), resource_set_range() and resource_set_size() in include/linux/ioport.h are your friends here. In general, DEFINE_RES is preferred but when only addresses of a resource are adjusted dynamically, those C functions are better suited for the task.
Given the confusion,
There's lots of resource setup code which does set resource end address properly by applying -1 to it.
also other case of such usage were searched in the kernel and with great suprise it seems LOTS of place assume resource_size() should return zero in the context of the resource start and end set to 0.
Quoting for example comments in drivers/vfio/pci/vfio_pci_core.c:
/* * The PCI core shouldn't set up a resource with a * type but zero size. But there may be bugs that * cause us to do that. */ if (!resource_size(res)) goto no_mmap;This place does not tell you what ->end is expected to be set? Where did you infer that information?
I suspect this code would want to do resource_is_assigned() (which doesn't exists yet but would check only res->parent != NULL) or base the check on IORESOURCE_UNSET|IORESOURCE_DISABLED.
Often using resourse_size() (or a few other address based checks) in drivers is misguided, what drivers are more interested in is if the resource is valid and/or properly in the resource tree (assigned by the PCI core which implies it's valid too and has a non-zero size), not so much about it's size. As you can see, size itself not even used here at all, that is, this place never was interested in size but uses it as a proxy for something else (like also many other drivers do)!
Yes I agree that resource_size() is not the correct way to check if a resource is valid but it seems to be used for this task in some code.
It really seems resource_size() was tought with the assumption that resource struct was always correctly initialized before calling it and never set to zero.
But across the year this got lost and now there are lots of driver that assume resource_size() returns 0 if start and end are also 0.
Who creates such resources?
It's more a matter of using resource_size() in place where struct resource is optionally used.
Yes but it is a latent logic bug (using the resource size as the proxy for proper validity check in the logic).
Using these proxies (in general) is what makes it hard to make some resource change I'd actually want to do in the PCI core so I don't have have much love for it (I'll probably eventually have to audit them all myself :-(). As a result, we currently have the unfortunate situation that a resizable BAR that cannot be initially assigned is lost forever. The resource had to be reset on assignment failure to keep these proxies agreeing with the state of the resource, as endpoint drivers would be confused otherwise, so making space by resizing other BARs will not bring the other BAR back.
This proxy problem actually extends beyond mere resource_size(). There are also start and end address checks and pci_resource_len() checks similarly used as proxies. None of those proxies remain valid if PCI core stops to reset resource addresses and flags.
If flags are non-zero, those places should be fixed (I'm currently fixing one case from drivers/pci/probe.c thanks to you bringing this up :-)) but I think the number of places to fix are fewer than you seem to think. I read though all end assignments in PCI core and found only this single(!) problem there.
Thanks a lot for checking the PCI code, that is one of my concern subsystem where this might be problematic. The other is the OF code.
Unfortunately I'm no OF expert but at least drivers/of/ didn't have anything interesting when grepping with:
git grep -e '->end = ' -e '.end = ' -- drivers/of
To better handle this and make resource_size() returns correct value in such case, add a simple check and return 0 if both resource start and resource end are zero.
Cc: Rob Herring (Arm) robh@kernel.org Cc: stable@vger.kernel.org Fixes: 1a4e564b7db9 ("resource: add resource_size()") Signed-off-by: Christian Marangi ansuelsmth@gmail.com
include/linux/ioport.h | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/include/linux/ioport.h b/include/linux/ioport.h index 9afa30f9346f..1b8ce62255db 100644 --- a/include/linux/ioport.h +++ b/include/linux/ioport.h @@ -288,6 +288,9 @@ static inline void resource_set_range(struct resource *res, static inline resource_size_t resource_size(const struct resource *res) {
- if (!res->start && !res->end)
return 0;This looks wrong to me.
Yes it's indded wrong as I didn't think of case where start and end can match and correctly return a size of 1.
Lets try to fix the resource setup side instead. The real problem is in any code that sets ->end to zero when it wanted to set resource size to zero, which are not the same thing. To make it simpler (no need to manually apply -1 to the end address) and less error prone, we have helpers resource_set_range/size() and DEFINE_RES() that handle applying -1 correctly.
OK! Thanks for the pointer so there is a DEFINE_RES() for local variables!
- return res->end - res->start + 1;
} static inline unsigned long resource_type(const struct resource *res)
Taking this suggestion from your other email:
if (!res.flags && !res.start && !res.end) return 0;
IMO, the caller shouldn't be calling resource_size() at all when !res.flags as that indicates the caller is confused and doesn't really know what it is even checking for. If a driver does even know if the resource is valid (has a valid type) or not, it should be checking that _first_ (it might needs the size for valid resources but it should IMO still check validity of the resource first).
Also, as mentioned above, some/many drivers are not fundamentally interested in validity itself but whether the resource is assigned to the resource tree.
For valid resources, this extra check is entirely unnecessary.
I'd be supportive of adding
WARN_ON_ONCE(!res.flags)
...here but that will be likely a bit too noisy without first auditting many places (from stable people's perspective a triggering WARN_ON() => DoS). But that's IMO the direction kernel code should be heading.
I'm more than happy to send a v2 of this fixing ath11k and adding the WARN_ON_ONCE with res.flags.
Seems the only correct way to track this down and improve this in the next kernel versions.
Also thanks to Andy and you for explaining this and reasurme me there wasn't a big bug around this helper but just some minor logical confusion.
Do you agree on the plan for v2? (we can assume the noise won't be so much as I expect only SOME driver to be affected by this problem) (I hope)
Yes please to fixing ath11k!
And if you're fine with dealing all the reports that will get directed to you because of adding that WARN_ON_ONCE() into resource_size() I'd be very happy of having to not do it myself eventually but please do realize it may be more than a few reports over a few next kernel releases. ;-)
That being said, I just tested on one machine here and it seems to not fire even once so we could get more lucky than I was assuming.
It might also be helpful to add kernel doc to resource_size() and mention it should not be used as a proxy for other things as we've discussed here.
On contrary to what I said earlier, I just realized the series adding resource_assigned() got accepted [1] in this cycle so it's now available in Linus' tree.
[1] https://lore.kernel.org/all/69339e215b09f_1e0210057@dwillia2-mobl4.notmuch/