On 12/11/25 5:15 am, Yang Shi wrote:
On 11/10/25 9:12 PM, Dev Jain wrote:
On 11/11/25 10:38 am, Yang Shi wrote:
On 11/10/25 8:55 PM, Dev Jain wrote:
On 11/11/25 10:14 am, Yang Shi wrote:
On 11/10/25 8:37 PM, Dev Jain wrote:
On 11/11/25 9:47 am, Yang Shi wrote: > > > On 11/10/25 7:39 PM, Dev Jain wrote: >> >> On 05/11/25 9:27 am, Dev Jain wrote: >>> >>> On 04/11/25 6:26 pm, Will Deacon wrote: >>>> On Tue, Nov 04, 2025 at 09:06:12AM +0530, Dev Jain wrote: >>>>> On 04/11/25 12:15 am, Yang Shi wrote: >>>>>> On 11/3/25 7:16 AM, Will Deacon wrote: >>>>>>> On Mon, Nov 03, 2025 at 11:43:06AM +0530, Dev Jain wrote: >>>>>>>> Post a166563e7ec3 ("arm64: mm: support large block >>>>>>>> mapping when >>>>>>>> rodata=full"), >>>>>>>> __change_memory_common has a real chance of failing due >>>>>>>> to split >>>>>>>> failure. >>>>>>>> Before that commit, this line was introduced in >>>>>>>> c55191e96caa, >>>>>>>> still having >>>>>>>> a chance of failing if it needs to allocate pagetable >>>>>>>> memory in >>>>>>>> apply_to_page_range, although that has never been >>>>>>>> observed to be true. >>>>>>>> In general, we should always propagate the return value >>>>>>>> to the caller. >>>>>>>> >>>>>>>> Cc: stable@vger.kernel.org >>>>>>>> Fixes: c55191e96caa ("arm64: mm: apply r/o permissions of VM >>>>>>>> areas to its linear alias as well") >>>>>>>> Signed-off-by: Dev Jain dev.jain@arm.com >>>>>>>> --- >>>>>>>> Based on Linux 6.18-rc4. >>>>>>>> >>>>>>>> arch/arm64/mm/pageattr.c | 5 ++++- >>>>>>>> 1 file changed, 4 insertions(+), 1 deletion(-) >>>>>>>> >>>>>>>> diff --git a/arch/arm64/mm/pageattr.c >>>>>>>> b/arch/arm64/mm/pageattr.c >>>>>>>> index 5135f2d66958..b4ea86cd3a71 100644 >>>>>>>> --- a/arch/arm64/mm/pageattr.c >>>>>>>> +++ b/arch/arm64/mm/pageattr.c >>>>>>>> @@ -148,6 +148,7 @@ static int change_memory_common(unsigned >>>>>>>> long addr, int numpages, >>>>>>>> unsigned long size = PAGE_SIZE * numpages; >>>>>>>> unsigned long end = start + size; >>>>>>>> struct vm_struct *area; >>>>>>>> + int ret; >>>>>>>> int i; >>>>>>>> if (!PAGE_ALIGNED(addr)) { >>>>>>>> @@ -185,8 +186,10 @@ static int >>>>>>>> change_memory_common(unsigned >>>>>>>> long addr, int numpages, >>>>>>>> if (rodata_full && (pgprot_val(set_mask) == >>>>>>>> PTE_RDONLY || >>>>>>>> pgprot_val(clear_mask) == PTE_RDONLY)) { >>>>>>>> for (i = 0; i < area->nr_pages; i++) { >>>>>>>> - __change_memory_common((u64)page_address(area->pages[i]), >>>>>>>> + ret = >>>>>>>> __change_memory_common((u64)page_address(area->pages[i]), >>>>>>>> PAGE_SIZE, set_mask, >>>>>>>> clear_mask); >>>>>>>> + if (ret) >>>>>>>> + return ret; >>>>>>> Hmm, this means we can return failure half-way through the >>>>>>> operation. Is >>>>>>> that something callers are expecting to handle? If so, how >>>>>>> can they tell >>>>>>> how far we got? >>>>>> IIUC the callers don't have to know whether it is half-way >>>>>> or not >>>>>> because the callers will change the permission back (e.g. >>>>>> to RW) for the >>>>>> whole range when freeing memory. >>>>> Yes, it is the caller's responsibility to set >>>>> VM_FLUSH_RESET_PERMS flag. >>>>> Upon vfree(), it will change the direct map permissions back >>>>> to RW. >>>> Ok, but vfree() ends up using update_range_prot() to do that >>>> and if we >>>> need to worry about that failing (as per your commit >>>> message), then >>>> we're in trouble because the calls to set_area_direct_map() >>>> are unchecked. >>>> >>>> In other words, this patch is either not necessary or it is >>>> incomplete. >>> >>> Here is the relevant email, in the discussion between Ryan and >>> Yang: >>> >>> https://lore.kernel.org/all/fe52a1d8-5211-4962-afc8-c3f9caf64119@os.ampereco... >>> >>> >>> We had concluded that all callers of set_memory_ro() or >>> set_memory_rox() (which require the >>> linear map perm change back to default, upon vfree() ) will >>> call it for the entire region (vm_struct). >>> So, when we do the set_direct_map_invalid_noflush, it is >>> guaranteed that the region has already >>> been split. So this call cannot fail. >>> >>> https://lore.kernel.org/all/f8898c87-8f49-4ef2-86ae-b60bcf67658c@os.ampereco... >>> >>> >>> This email notes that there is some code doing set_memory_rw() >>> and unnecessarily setting the VM_FLUSH_RESET_PERMS >>> flag, but in that case we don't care about the >>> set_direct_map_invalid_noflush call failing because the >>> protections >>> are already RW. >>> >>> Although we had also observed that all of this is fragile and >>> depends on the caller doing the >>> correct thing. The real solution should be somehow getting rid >>> of the BBM style invalidation. >>> Ryan had proposed some methods in that email thread. >>> >>> One solution which I had thought of, is that, observe that we >>> are doing an overkill by >>> setting the linear map to invalid and then default, for the >>> *entire* region. What we >>> can do is iterate over the linear map alias of the vm_struct >>> *area and only change permission >>> back to RW for the pages which are *not* RW. And, those >>> relevant mappings are guaranteed to >>> be split because they were changed from RW to not RW. >> >> @Yang and Ryan, >> >> I saw Yang's patch here: >> https://lore.kernel.org/all/20251023204428.477531-1-yang@os.amperecomputing.... >> >> and realized that currently we are splitting away the linear >> map alias of the *entire* region. >> >> Shouldn't this then imply that set_direct_map_invalid_noflush >> will never fail, since even >> >> a set_memory_rox() call on a single page will split the linear >> map for the entire region, >> >> and thus there is no fragility here which we were discussing >> about? I may be forgetting >> >> something, this linear map stuff is confusing enough already. > > It still may fail due to page table allocation failure when > doing split. But it is still fine. We may run into 3 cases: > > 1. set_memory_rox succeed to split the whole range, then > set_direct_map_invalid_noflush() will succeed too > 2. set_memory_rox fails to split, for example, just change > partial range permission due to page table allocation failure, > then set_direct_map_invalid_noflush() may > a. successfully change the permission back to default till > where set_memory_rox fails at since that range has been > successfully split. It is ok since the remaining range is > actually not changed to ro by set_memory_rox at all > b. successfully change the permission back to default for the > whole range (for example, memory pressure is mitigated when > set_direct_map_invalid_noflush() is called). It is definitely > fine as well
Correct, what I mean to imply here is that, your patch will break this? If set_memory_* is applied on x till y, your patch changes the linear map alias
only from x till y - set_direct_map_invalid_noflush instead operates on 0 till size - 1, where 0 <=x <=y <= size - 1. So, it may encounter a -ENOMEM
on [0, x) range while invalidating, and that is *not* okay because we must reset back [0, x) to default?
I see your point now. But I think the callers need to guarantee they call set_memory_rox and set_direct_map_invalid_noflush on the same range, right? Currently kernel just calls them on the whole area.
Nope. The fact that the kernel changes protections, and undoes the changed protections, on the *entire* alias of the vm_struct region, protects us from the fragility we were talking about earlier.
This is what I meant "kernel just calls them on the whole area".
Suppose you have a range from 0 till size - 1, and you call set_memory_* on a random point (page) p. The argument we discussed above is independent of p, which lets us drop our
previous erroneous conclusion that all of this works because no caller does a partial set_memory_*.
Sorry I don't follow you. What "erroneous conclusion" do you mean? You can call set_memory_* on a random point, but set_direct_map_invalid_noflush() should be called on the random point too. The current code of set_area_direct_map() doesn't consider this case because there is no such call. Is this what you meant?
I was referring to the discussion in the linear map work - I think we had concluded that we don't need to worry about the BBM style invalidation failing, *because* no one does a partial set_memory_*.
Yes, we don't have to worry about it.
What I am saying - we don't care whether caller does a partial or a full set_memory_*, we are still safe, because the linear map alias change on both sides (set_memory_* -> __change_memory_common, and vm_reset_perms -> set_area_direct_map() )
operate on the entire region.
Yes, this is the current behavior. My patch changes change_memory_common() to just do permission update for the requested range from the callers instead of assuming change the entire region, although there is no one calls set_memory_* on a partial range. Shall set_area_direct_map() be aware of potential partial range change from set_memory_*()? Maybe. But it is just called from vfree() which just free the entire region.
What happened if someone does something crazy, for example, call set_memory_* on a partial range, then call vfree? IIUC, it is fine as well. It is still covered by the 3 cases that I mentioned in the previous email if I don't miss anything, right?
Assuming the caller also does set_vm_flush_reset_perms(), the 3 cases will work out.
Thanks, Yang
I would like to send a patch clearly documenting this behaviour, assuming no one else finds a hole in this reasoning.
Proper comment to explain the subtle behavior is definitely welcome.
Thanks, Yang
Thanks, Yang
> > Hopefully I don't miss anything. > > Thanks, > Yang > > >> >> >>> >>>> >>>> Will >>> >